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SECTION  1 
INTRODUCTION 

Distributed  Processing  Systems  are  currently  receiving  a  very  large 
amount  of  attention.  This  is  due  in  part  to  the  claims  that  these  systems 
will  provide  a  number  of  advantages  over  contemporary  systems  ( see  Table  1 ) . 
Some  of  the  more  important  potential  advantages  being  publicized  are  the  fol¬ 
lowing:  Increased  performance  (with  respect  to  both  throughput  and  response 

time),  ability  to  share  resources,  ease  of  system  expansion,  and  the  ability 
to  provide  fault-toleranoe. 

Table  1.  "Benefits"  Provided  by  Distributed  Processing  Systems 

A  Representative  List  Assembled  from  Claims  Hade  in 
Actual  Sales  Literature 

High  Availability  and  Reliability 
Reduced  Network  Costs 
High  System  Performance 
Fast  Response  Time 
High  Throughput 

Graceful  Degradation,  Fall-soft 
Ease  of  Nodular  and  Incremental  Growth 
Configuration  Flexibility 
Automatic  Load  and  Resource  Sharing 
Easily  Adaptable  to  Changes  in  Workload 
Incremental  Replacement  and/or  Upgrade 
Easy  Expansion  in  Capacity  and/or  Funotion 
Good  Response  to  Temporary  Overloads 

This  report  is  concerned  with  a  particular  class  of  distributed  proces- 
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aing  systems,  "Fully  Distributed  Processing  Systems  (FDPS),"  which  are  the 
foous  of  a  major  research  program  at  the  Georgia  Institute  of  Technology.  For 
a  system  to  be  classified  as  an  “FDPS,"  it  must  possess  all  five  of  the  fol¬ 
lowing  characteristics: 

1 .  Multiplicity  of  resources :  an  FDPS  is  composed  of  a  mul¬ 

tiplicity  of  "general-purpose"  resources  that  can  be  freely 
assigned  on  a  short-term  basis  to  various  system  tasks  as 
required  (e.g. ,  hardware  and  software  processors,  shared  data 
bases,  etc.). 

2.  Component  Interconnection  t  the  aotive  components  in  the  FDPS 

are  physically  connected  by  a  communication  network(a)  utiliz¬ 
ing  two-party,  cooperative  protocols  to  control  the  physical 
transfer  of  data  (i.e.,  loose  physical  and  logical  coupling). 

3.  Unity  of  control:  the  executive  oontrol  of  an  FDPS  must  define 
and  support  a  unified  set  of  policies  governing  the  operation 
and  utilization  of  all  physical  and  logical  resources. 

4 .  System  users  must  be  able  to  request  services  by 

generic  names  without  being  aware  of  their  physical  location  or 

even  the  fact  that  multiple  oopies  of  the  resources  may  exist. 

(System  transparency  is  designed  to  aid  rather  than  inhibit 
and,  therefore,  can  be  overridden.  A  user  who  is  concerned 
about  the  performance  of  a  particular  application  oan  provide 
system-specific  Information  to  aid  in  the  formulation  of 
management  control  decisions.) 

5.  Component  mtnnnwn  both  the  logical  and  physical  components 

of  an  FDPS  should  interact  in  a  manner  described  as 

"cooperative  autonomy"  [Ensl78].  This  means  that  the  com¬ 
ponents  operate  in  an  autonomous  fashion  requiring  cooperation 
among  processes  for  the  exchange  of  Information  as  well  as  for 
the  provision  of  servioes.  In  a  cooperatively  autonomous 
control  environment,  the  components  are  afforded  the  ability  to 
refuse  requests  for  service,  regardless  of  whether  the  service 
request  involves  execution  of  a  process  or  the  use  of  a  file. 

This  could  result  in  anarchy  except  for  the  fact  that  all  com¬ 
ponents  adhere  to  a  common  set  of  system  utilization  and 
management  policies  expressed  by  the  philosophy  of  the 
executive  control. 

A  more  detailed  explanation  of  these  characteristics  is  found  in  Section  2  of 
this  report. 

An  essential  coot  ^ tent  of  an  FDPS  is  the  distributed  and  decentralized 
control.  This  oomponer  inifies  the  management  of  the  resources  of  the  FDPS 
and  provides  system  transparency  to  the  user.  A  previous  study  (see  [Ensl8l]) 
examined  the  characteristics  of  various  models  of  distributed  and 
decentralized  control  that  met  this  oriteria  and  identified  a  number  of 
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variations  possible  in  speoiflc  features  of  the  different  models.  That 
research  helped  to  define  sore  dearly  the  exact  nature  of  the  operation  of  an 
FDPS,  the  probleas  inherent  in  distributed  and  decentralized  control,  and  pos¬ 
sible  solutions  to  these  probleas. 

The  scope  and  goal  of  the  present  York  is  to  both  qualitatively  and 
quantitatively  evaluate  the  effect  of  these  features  on  the  performance  of  the 
various  models  of  control.  The  qualitative  evaluation  is  intended  to 
demonstrate  how  a  particular  model  performs  in  a  specific  environment.  In 
this  phase,  the  validity  of  a  model  is  established.  The  quantitative 
evaluation,  on  the  other  hand,  is  intended  to  examine  in  general  the  relative 
merits  of  decentralized  oontrol  and  provide  data  to  support  conclusions  about 
the  relative  performance  of  the  various  models. 
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SKCTXOG  2 
BACKGROUND 


2.1  1B£  DEFINITION  J2£  1&  TOPS 

Fully  Distributed  Processing  Systems  (FDPS)  were  first  defined  by  Enslow 
in  1976  [Ensl78]  although  the  designation  "fully"  was  not  added  until  1978 
when  it  beoame  necessary  to  clearly  distinguish  this  specific  class  of  systems 
from  the  many  others  being  presented  a&  "distributed  processing  systems."  As 
discussed  in  Section  1 ,  an  FDPS  is  distinguished  by  the  following  charac¬ 
teristics: 

1.  Multiplicity  of  resources. 

2.  Component  interconnection. 

3.  Unity  of  control. 

4.  System  transparency. 

5.  Component  autonomy. 

It  is  important  to  note  that  in  order  for  a  system  to  qualify  as  being 
Xully  distributed  it  must  possess  all  five  of  the  criteria  presented  in  this 
definition. 

2.1.1  Multiple  Baaouroea  »nd  Their 

The  requirement  for  resource  multiplicity  concerns  the  assignable 
resources  that  a  system  provides.  Therefore ,  the  type  of  resources  requiring 
replication  depends  on  the  purpose  of  a  system.  For  example,  a  distributed 
system  designed  to  perform  real-time  computing  for  air  traffic  control 
requires  a  multiplicity  of  special-purpose  air  traffic  control  processors  and 
display  terminals.  It  is  not  required  that  replioated  resources  be  exactly 
homogeneous;  instead,  they  must  be  oapable  of  providing  the  same  servloes. 

In  addition  to  the  requirement  for  multiplicity,  the  system  resources 
must  be  dynamically  reoonflgurable  to  respond  to  component  failures  as  well  as 
changes  in  the  work  load  presented  to  the  system.  This  reconfiguration  must 
occur  within  a  "short"  period  of  time  so  as  to  maintain  the  functional 
capabilities  of  the  overall  system  without  affeoting  the  operation  of  com¬ 
ponents  not  direotly  involved.  Under  normal  operation,  the  system  must  be 
able  to  dynamically  assign  its  tasks  to  components  distributed  throughout  the 
system. 
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The  extent  to  whioh  resources  are  replioated  oan  range  from  those 
systems  where  none  are  replicated  (not  a  fully  distributed  system)  to  systems 
with  all  assignable  resources  replicated.  In  addition,  the  number  of  oopies 
of  a  particular  resource  can  vary  depending  on  the  system  and  type  of 
resource.  In  general,  the  greater  the  degree  of  replication,  particularly  of 
resources  in  high  demand,  the  greater  the  potential  for  attaining  benefits 
such  as  increased  performance  (response  time  and  throughput),  availability, 
reliability,  and  flexibility  [Gnsl78]. 

2.1.2  Component  Interconnection  and  Communication 

The  extent  of  physical  distribution  of  resources  in  distributed  systems 
can  range  from  the  length  of  a  connection  between  components  on  a  single 
integrated  chip  to  the  distanoe  between  two  computers  communicating  through  an 
international  network.  In  addition,  interconnection  subsystem  organizations 
can  vary  from  a  single  time-shared  bus  to  a  complex,  mesh  interconnection 
network.  Since  a  component  In  a  distributed  system  communicates  with  other 
components  through  its  own  logical  prooeas,  all  physioal  and  logical  resources 
can  be  thought  of  as  processes,  and  interactions  between  resources  can  be 
referred  to  as  interprocess  communication  [Davi79)»  For  example,  application 
program  Interaction  with  data  files  is  accomplished  through  communication 
between  logical  processes,  the  application  process  and  the  file  process. 

In  an  FDPS,  both  the  physical  and  logical  coupling  of  the  system  com¬ 
ponents  are  characterized  as  "extremely  loose."  "Gated"  or  "master-slave" 
control  of  physical  transfers  is  not  allowed.  Communication  (l.e.,  the 
physical  transfer  of  messages)  is  accomplished  through  the  active  cooperation 
and  participation  of  both  the  sender  and  addressees.  The  primary  requirement 
of  the  interconnection  subsystem  is  that  it  support  such  a  two-party 
cooperative  protoool.  This  is  essential  to  enable  the  system's  resources  to 
exist  with  "cooperative  autonomy"  at  the  physioal  level. 

The  advantages  of  using  a  message-based  (loosely-ooupled)  communication 
system  with  a  two-party  cooperative  protocol  include  reliability, 
availability,  and  extensibility.  The  disadvantage  is  the  additional  overhead 
of  message  processing  incurred  to  support  this  method  of  communication.  There 
are  a  variety  of  interconnection  organizations  and  communication  techniques 
that  can  be  used  to  support  a  message-based  system  with  a  two-party 
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cooperative  protocol, 

2.1*3  Pal tv  of  Control 

In  a  fully  distributed  data  processing  system,  individual  processors 
will  oontrol  looal  resources  with  their  own  local  operating  systems,  whioh  aay 
or  may  not  be  unique.  As  a  result,  oontrol  is  distributed  throughout  the 
sy stem  to  oontrol  system  components  that  operate  autonomously.  However,  to 
gain  the  benefits  of  distributed  processing,  it  is  required  that  the 
autonomous  components  of  the  system  cooperate  with  each  other  to  achieve  the 
overall  objectives  of  the  system.  To  insure  tnis,  the  concept  of  a  high-level 
operating  system  was  oreated  to  integrate  and  unify,  at  least  oonoeptually, 
the  decentralised  control  of  the  system. 

A  high-level  operating  system  is  essential  to  the  successful  implementa¬ 
tion  of  a  distributed  processing  system.  The  high-level  operating  system  is 
not  a  centralized  block  of  oode  exercising  strong  hierarchical  oontrol  over 
the  system;  instead,  it  Is  a  well-defined  set  of  policies  governing  the 
integrated  operation  of  the  system  as  a  whole.  To  insure  reliable  and 
flexible  operation  of  the  system,  these  policies  should  be  implemented  with 
minimal  bi..  *-'g  to  any  of  the  system's  components  [Ensl78]. 

What  policies  are  required  and  how  they  should  be  implemented  depends 
greatly  on  the  system.  For  example,  if  it  is  a  general-purpose  system  sup¬ 
porting  interactive  users,  then  a  command  interpreter  and  a  user  control 
langu  ~  is  required  to  make  the  system's  components  compatible  and 
tran&pa.  't  to  the  user. 

2.1.4  System  Control 

The  high-level  operating  system  also  provides  the  user  with  an  interface 
to  the  distributed  system.  As  a  result,  the  user  is  accessing  the  system  as  a 
whole  rather  than  just  a  single  computer  in  the  network. 

In  order  to  increase  the  effectiveness  of  the  distributed  system,  the 
aotual  system  organization  is  made  transparent.  The  user  is  presented  with  a 
virtual  machine  and  a  command  language  to  aocess  it.  Using  this  command 
language,  the  user  requests  services  by  name  and  does  not  need  to  specify  the 
speoific  server  to  be  used.  Clearly,  multiple  requests  for  the  same  service 
might  be  assigned  to  different  servers  depending  on  the  state  of  the  total 
system  when  the  request  is  made.  However,  to  make  the  system  truly  effective 
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for  all  users,  knowledgeable  individuals  must  be  able  to  lnteraot  with  the 
system  more  directly,  requesting  specific  servers  or  developing  servloe 
routines  to  inorease  the  effiolenoy  or  effectiveness  of  the  system  [Ensl78]. 

2.1.5  Cooperative  Autonomy 

Cooperative  autonomy  has  already  been  described  at  the  physical  inter¬ 
connection  level.  It  is  also  required  that  all  resouroea  be  autonomous  at  the 
logioal  control  level.  A  resouroe  must  have  complete  control  in  determining 
which  requests  it  will  service  and  what  future  operations  it  will  perform. 
However,  a  resouroe  must  also  cooperate  with  other  resources  by  operating 
according  to  the  policies  of  the  high-level  operating  system.  Cooperative 
autonomy  is  an  essential  prerequisite  for  systems  to  have  fault  tolerance  and 
high  degrees  of  extensibility  [Ensl76].  It  is  perhaps  the  most  important  and 
most  distinguishing  characteristic  of  a  fully  distributed  prooessing  system. 

2.2  CHARACTERIZATION  SSL  PMI8IMIM  AE£  DBCSTIRALIZBP  CONTROL 

2.2.1  General  Mature  of  FDPS  Executive  Control 

The  executive  control  is  responsible  for  managing  the  resources  of  the 
FDPS.  Its  charter  is  to  perform  the  management  funotion  in  such  a  manner  that 
the  resources  of  the  FDPS  are  unified  and  users  of  the  FDPS  are  shielded  from 
the  physical  realities  of  distribution.  In  other  words,  the  executive  control 
provides  system  transparent  for  tne  user. 

The  executive  control  oi  an  FDPS  can  be  implemented  in  many  different 
ways.  It  can  consist  of  identical  modules  replicated  on  all  nodes  of  the 
system.  Alternatively,  it  can  oonsist  of  several  unique  modules  distributed 
in  some  manner  about  the  system.  The  essential  point  is  that  the  term 
"executive  control"  does  not  neoessarily  mean  a  particular  module  at  a 
particular  node,  but  rather  the  entire  collection  of  modules  that  are 
distributed  somehow  throughout  the  system  and  are  working  together  to  manage 
the  system's  resources. 

2.2.2  flantool  ZEnlhl—R  Resulting  from  £&&  £££&  RpTirQntnt. 

Several  oharacteri sties  of  an  FDPS  are  found  to  directly  impact  the 
design  and  implementation  of  the  executive  oontrol.  These  inolude  system 
transparency  to  the  user,  extremely  loose  physioal  and  logical  ooupling,  and 
cooperative  autonomy  as  the  basio  mode  of  component  interaction.  System 
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transparency  means  that  the  FDPS  appears  to  a  user  as  a  large  uniprocessor 
which  has  available  a  variety  of  services.  It  must  be  possible  for  the  user 
to  obtain  these  services  by  naming  them  without  specifying  any  information 
oonoer  ing  the  details  of  their  physical  looation.  The  task  of  locating  all 
appropriate  instances  (oopies)  of  a  particular  resource  and  choosing  the 
instanoe  to  be  utilized  is  left  to  the  executive  control. 

"Cooperative  autonomy"  is  another  character! atic  of  an  FDPS  that  has  a 
large  effect  on  the  design  of  the  executive  oontrol.  The  "lower- level" 
oontrol  functions  of  both  the  logioal  and  physical  resource  components  of  an 
FDPS  are  designed  no  operate  in  a  "cooperatively  autonomous"  fashion.  Thus, 
the  executive  oontrol  must  be  designed  such  that  any  resource  is  able  to 
refuse  a  request  even  though  it  may  have  physically  acoepted  the  message 
containing  that  request.  Degeneration  into  total  anarchy  is  prevented  by  the 
establishment  of  a  common  set  of  criteria  to  be  followed  by  all  resources  in 
determining  whether  a  request  is  acoepted  and  serviced  as  originally 
presented,  aooepted  only  after  bidding  or  negotiation,  or  rejeoted. 

Another  important  FDPS  oharacteristic  that  definitely  affeots  the  design 
of  its  executive  oontrol  is  the  extremely  loose  coupling  of  both  physical  and 
logical  resources.  The  components  of  an  FDPS  are  connected  by  communication 
P8th3  of  relatively  low  bandwidth.  The  direct  sharing  of  primary  memory 
between  processors  is  not  acceptable.  Even  though  the  logical  coupling  could 
still  be  loose  with  this  physical  interconnection  mechanism,  the  presence  of  a 
single  critloal  hardware  element,  the  shared  memory,  would  create  fault- 
tolerance  limitations.  Therefore,  alJ  communication  takes  plaoe  over  "stan¬ 
dard"  input/output  paths.  The  actual  data  rates  that  can  be  supported  are 
primarily  a  function  of  the  interconnections  between  the  processors  and  the 
capability  of  their  input/output  paths.  The  available  transfer  rates  are  muoh 
less  than  memory  transfer  rates.  This  implies  that  the  sharing  of  control 
information  among  components  on  different  processors  is  greatly  restricted. 
System  oontrol  is  forced  to  work  with  information  that  is  "out-of-date"  and, 
as  a  result,  perhaps  "lnaocurate. " 

The  control  of  an  FDPS  requires  the  participation  and  cooperation  of 
oompc cents  at  all  layers  of  the  system.  This  implies  that  there  are  elements 
of  FDPo  oontrol  present  in  the  lowest  levels  of  the  hardware  and  software  com- 
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ponents.  This  study  Is  primarily  interastad  in  tha  software  oomponentc  of  tha 
FDPS  control  whioh  are  typioally  referred  to  as  "the  executive  control."  Low- 
level  aspeots  of  FDPS  oontrol  will  not  be  directly  examined. 

The  executive  oontrol  is  responsible  for  managing  the  physical  and 
logioal  resources  of  a  system.  It  accepts  user  requests  and  obtains  and 
schedules  the  resources  necessary  to  satisfy  a  user's  needs.  The  manner  in 
which  these  tasks  are  accomplished  is  designed  to  unify  the  distributed  com¬ 
ponents  of  the  system  into  a  whole  and  provide  system  transparency  to  the 
user. 

2.2.3  But  Hat  Cintraimu  Control? 

Why  is  a  centralized  method  of  oontrol  not  appropriate?  In  systems 
utilizing  a  centralized  executive  control,  all  of  the  control  prooesses  share 
a  single,  ooherent,  and  aocurate  view  of  the  entire  system  state.  An  FDPS, 
though,  contains  only  loosely-ooupled  components,  the  communication  between 
which  is  limited  and  subject  to  variable  time  delays.  This  means  that  one 
cannot  guarantee  that  all  oontrol  prooesses  will  have  the  same  view  of  the 
system  state  [Jens78],  In  fact,  it  is  a  significant  characteristic  of  an  FDPS 
that  all  oontrol  prooesses  will  probably  not  have  a  consistent  view. 

A  centralized  executive  oontrol  weakens  the  fault-toleranoe  of  the 
overall  system  due  to  the  existence  of  a  single  oritioal  element,  the 
executive  oontrol  component  Itself.  This  obstacle,  though,  is  not 
insurmountable.  Strategies  do  exist  for  providing  fault-toleranoe  in 
centralized  applications.  Garcia-Molina  [Garo79],  for  example,  has  described 
a  scheme  for  providing  fault-toleranoe  in  a  distributed  data  base  management 
system  with  a  centralized  control.  Approaches  of  this  type  typically  assume 
that  failures  are  extremely  rare  events  and  that  the  system  can  tolerate  the 
dedication  of  a  relatively  long  interval  of  time  to  reconfiguration.  These 
restrictions  may  be  unacceptable  in  an  FDPS  environment  in  which  it  is 
Important  to  provide  fault-toleranoe  with  a  minimum  of  disruption  to  the  ser¬ 
vices  being  supported. 

Also,  the  extremely  important  issue  of  overall  system  performance  must 
be  considered.  A  distributed  processing  system  is  expected  to  utilize  a  large 
quantity  and  a  wide  variety  of  resources.  If  a  completely  centralized 
executive  oontrol  is  implemented,  there  is  a  high  probability  that  a 
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bottleneck  will  be  oreated  in  the  node  exeoutlng  the  control  functions.  A 
distributed  and  decentralized  approaoh  to  oontrol  attempts  to  remove  this  bot¬ 
tleneck  by  dispersing  the  oontrol  decisions  among  multiple  components  on 
different  nodes. 

2.2.4  Distributed  xa.  Dtotatralliid 

The  discussion  above  supports  the  requirement  that  the  executive  oontrol 
of  an  FDPS  must  be  both  "distributed"  and  "decentralized,"  and  it  should  be 
noted  that  there  is  a  clear  distinction  between  the  terms  "distributed 
control"  and  "decentralized  oontrol"  as  they  are  used  in  the  context  of  this 
project.  "Distributed  oontrol"  is  characterized  by  having  its  executing 
QQBPQMnta  BhyalQftUY  lQ.oa.tofl  DU  dltgacant  nodes.  This  means  there  are 
multiply  lttfli  fit  control  activity.  In  "decentralized  control. "  on  the  other 
hand,  Agatr.nl  daalalana  aca  made  independently  £&  separate  components.  In 
other  words,  there  are  multiple  loci  of  oontrol  deoialon  —kina.  Thus, 
distributed  and  decentralized  oontrol  has  active  components  looated  on 
different  nodes,  and  those  components  are  capable  of  making  Independent 
oontrol  decisions. 

2.2.5  BatioaalB  Behind  Dlatrlbutcfl  juuL  DtuaBtraliatfl  Cnatral 

The  reasons  for  distributing  and  decentralizing  oontrol  result  from  two 
basic  goals  of  an  FDPS,  to  improve  performance  and  to  provide  a  more  fault- 
tolerant  system.  With  decentralized  decision  making,  a  system  can  potentially 
provide  responses  to  requests  in  a  shorter  amount  of  time  due  to  the  Increased 
utilization  of  resources  whioh  is  achieved  through  the  concurrent  execution  of 
the  decentralized  decision  makers. 

By  physloally  distributing  components,  one  is  assured  that  a  system 
retains  the  potential  to  keep  running  even  though  some  parts  have  been  lost. 
The  ability  to  function  independently  of  the  lost  components  is  provided  by 
decentralized  deoision  making.  Thus,  by  distributing  components  and 
decentralizing  decision  making,  the  potential  for  fault-tolerant  operation  is 
provided. 

2.3  fiYAUDAUQM 

The  steps  performed  in  the  evaluation  of  the  models  of  oontrol  are  as 
follows : 
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1.  Prepare  detailed  definitions  of  the  models  of  oontrol. 

2.  Construct  an  FDPS  simulator. 

3.  Perform  the  simulation  experiments. 

4.  Validate  the  oontrol  models. 

5.  Compare  the  relative  performance  data  for  the  different  oontrol 
models. 

2.3.1  Definition  of  Control  Models 

The  first  step  in  the  evaluation  prooess  is  to  define  in  greater  detail 
the  models  of  oontrol  originally  described  in  [Ensl8l].  One  of  the  goals  of 
the  present  researoh  is  to  validate  the  oontrol  models  in  order  to  examine 
their  performance  in  oertain  environments.  By  looking  at  the  finer  details  of 
the  models,  significant  control  problems  have  been  discovered  which  were  not 
apparent  from  earlier  high  level  studies. 

To  accomplish  this  detailed  study,  the  models  are  translated  into  a  high 
level  programming  language,  Pasoal.  The  resulting  oode  is  presented  in  Appen¬ 
dix  1  in  the  form  of  pseudo  oode.  The  pseudo  oode  is  derived  from  the  aotual 
Pasoal  oode  and  is  presented  in  place  of  the  aotual  oode  in  order  to  conserve 
space. 

2.3.2  CQMtruQttQB  at  j&  Siwlitar 

In  order  to  perform  both  validation  and  performance  analysis  it  is 
necessary  to  construct  an  FDPS  simulator.  The  models  of  oontrol  are 
translated  into  Pascal,  and  the  resulting  code  is  incorporated  into  the 
simulator.  Validation  is  accomplished  by  oonstruoting  various  test  cases 
which  are  designed  to  exercise  the  particular  executive  oontrol  functions 
being  tested.  A  detailed  transaction  log  is  maintained  in  order  to  follow  the 
actions  of  the  simulator,  and,  thus,  verify  the  oorrect  or  inoorreot  per¬ 
formance  of  each  portion  of  the  executive  control. 

The  simulator  also  oollects  various  performance  measurements.  These  are 
processed  at  the  termination  of  the  experiment  in  order  to  generate  per¬ 
formance  statistics.  The  Interval  during  which  measurements  are  oolleoted  is 
user  oontrollable.  This  allows  one  to  measure  steady  state  values  as  well  as 
performance  during  startup. 
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2.3.3  Slnnlatlpp  BiPtriupta 

Simulation  experiments  are  conducted  in  two  phases.  The  first  phase  is 
designed  to  validate  the  various  models  of  oontrol.  In  these  experiments , 
there  is  no  need  to  oolleot  performance  measurements;  instead,  a  detailed  log 
of  the  simulator's  actions  is  maintained.  This  is  then  analyzed  in  order  to 
observe  the  behavior  of  the  oontrol  model  under  test. 

In  the  seoond  phase  of  experiments,  performance  measurements  are  collec¬ 
ted,  but  no  transaction  log  is  maintained.  These  experiments  are  used  to 
obtain  data  concerning  the  relative  performance  of  the  various  models  of 
control.  In  order  to  obtain  steady  state  data,  measurements  are  not  collected 
until  some  time  after  startup.  Several  simulations  are  performed  on  each 
model  of  control.  Eaoh  simulation  provides  the  control  with  a  different 
environment.  To  obtain  different  environments,  the  interconnection  topology 
and  the  bandwldths  of  the  communication  links  are  varied. 

The  load  for  the  simulator  is  generated  in  the  following  manner.  The 
user  specified  configuration  determines  the  number  of  nodes,  the  connectivity 
of  these  nodes,  the  number  of  terminals  attached  to  eaoh  node,  and  the  initial 
state  of  the  file  system.  The  file  system  includes  data  files,  oommand  files, 
and  object  files.  Each  objeot  file  specifies  a  soript  of  actions  to  be 
simulated  in  order  to  simulate  the  execution  of  a  user  prooess.  The  user  of 
the  simulator  provides  a  series  of  commands  that  can  originate  from  a 
terminal.  These  commands  form  a  population  of  commands  from  which  the  load 
generator  randomly  selects  commands  for  arrival  from  specific  terminals.  The 
time  of  command  arrival  is  determined  by  generating  a  random  number  from  a 
particular  interval  marked  by  a  minimum  and  a  maximum  time  delay  between  sub¬ 
mission  of  commands. 

2.3.4  Validation  aL  Control  Moflola 

Validation  of  the  models  of  control  is  aohleved  by  constructing  input 
scripts  designed  to  excercise  the  particular  executive  oontrol  being  tested. 
The  resulting  transaction  log  is  analyzed  tc  trs_re  the  correct  performance  of 
the  executive  oontrol. 

2.3.5  CPBPftTiaOn  fit  RalatlVO  Performance  Models 

After  eaoh  test,  the  data  reduction  portion  of  the  simulator  utilizes 
the  performance  me--  »rements  gathered  during  the  specified  interval  of  time  to 
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oompute  the  following  statistics: 

1 .  The  average  service  time  for  a  user  session,  for  a  work 

request,  and  for  a  process.  (This  is  computed  for  all  nodes 

and  also  averaged  over  all  nodes.) 

2.  The  average  response  time  for  a  user  session,  for  a  work 

request,  and  for  a  process.  (This  is  computed  for  all  nodes 

and  also  averaged  over  all  nodes.) 

3.  The  throughput  for  user  sessions,  for  work  requests,  and  for 
processes.  (This  is  computed  for  all  nodes  and  also  averaged 
over  all  nodes. ) 

4.  For  the  READY  QUEUE  on  each  node,  the  MESSAGE  BLOCKED  QUEUE  on 
each  node,  each  DISK  WAITING  QUEUE  on  each  node,  and  each  LINK 
QUEUE  on  each  node  tLe  following  statistics  are  compiled: 

a.  The  minimum  time  spent  by  a  process  in  the 

queue. 

b.  The  maximum  time  spent  by  a  process  in  the 
queue. 

c.  The  average  time  spent  by  a  process  in  the 

queue. 

d.  The  minimum  queue  length  observed  by  a  process 
entering  the  queue. 

e.  The  maximum  queue  length  observed  by  a  process 
entering  the  queue. 

f.  The  average  queue  length  observed  by  a  process 
entering  the  queue. 

5.  The  number  of  user  messages,  control  messages,  and  the  total 
number  of  messages  sent  from  each  node  to  every  other  node. 

6.  The  number  of  user  messages,  control  messages,  and  the  total 
number  of  messages  sent  on  each  link. 

Utilizing  these  statistics,  conclusions  concerning  the  relative  merits 
of  each  of  the  models  of  control  are  made. 

2.4  PROJECT  SCOPE  AND  ORGANIZATION  JQ£  THIS  REPORT 

Following  these  first  two  sections  of  introductory  remarks,  this  paper 
examines  in  finer  detail  the  models  initially  presented  in  [Ensl8l].  Section 
3  contains  a  description  of  the  more  important  features  of  the  control  models 
under  examination.  A  pseudo  code  description  of  these  models  is  provided  in 
Appendix  1. 

The  simulator  used  in  the  evaluation  of  the  models  is  the  topic  of 
discussion  in  Section  4.  In  this  section,  the  goals  of  the  simulation 
experiments,  requirements  for  the  simulator,  and  the  structure  of  the 
simulator  are  discussed. 
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In  Section  5»  the  results  of  the  simulation  experiments  ere  examined. 
This  Includes  discussions  of  both  the  validity  of  the  models  in  certain 
environments  and  the  relative  performance  of  the  various  models  of  oontrol. 

Conclusions  about  the  results  o'  the  evaluation  studies  are  presented  in 
Seotion  6.  The  results  of  these  experiments  are  suamarized  and  placed  into 
proper  perspective  and  further  questions  that  this  study  stimulated  but  failed 
to  answer  are  identified. 
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SECTION  3 
MODELS  OF  CONTROL 

This  research  considers  six  different  models  of  control.  These  models 
are  described  in  general  terms  in  this  section,  and  pseudo  code  for  the  models 
is  provided  in  Appendix  1 .  The  models  are  similar  in  many  respects  differing 
usually  only  in  some  particular  aspect  of  control.  Therefore,  only  the  first 
model  is  presented  completely.  The  others  are  described  by  indicating  how 
they  differ  from  the  first  model. 

3.1  IB£  ffiBE&.l  fiflgRQL  HQBBL 

The  XFDPS.1  control  model  was  first  defined  in  [SapoSO]  and  further 
refined  in  [Ensl8l].  With  the  aid  of  a  simulation  environment,  this  model  has 
been  even  more  completely  defined.  The  XFDPS.1  model  is  composed  of  six  types 
of  components:  TASK  SET  MANAGERS,  FILE  SYSTQ1  MANAGERS,  FILE  SET  MANAGERS, 
PROCESSOR  UTILIZATION  MANAGERS,  PROCESSOR  UTILIZATION  MONITORS,  and  PROCESS 
MANAGERS.  (See  Figure  1.)  The  basic  strategy  of  this  model  of  control  is  to 
partition  the  system's  resources  and  assign  separate  components  to  manage  each 
partition. 

3*1.1  Task  Set  H»n«wip 

A  TASK  SET  MANAGER  is  assigned  to  each  user  terminal  as  well  as  to  each 
executing  command  file.  The  name  TASK  SET  MANAGER  results  from  the  nature  of 
user  work  requests  which  originate  from  user  terminals  and  command  files.  The 
work  requests  specify  one  or  more  executable  files  called  tasks  (these  contain 
either  object  code  or  oommands)  and  any  input  or  output  files  used  by  the 
tasks.  It  is  possible  for  the  tasks  of  a  work  request  to  communicate,  and 
this  communication  (task  connectivity)  is  also  described  by  the  work  request. 
Therefore,  each  work  request  specifies  a  set  of  tasks,  and  it  is  the  job  of 
the  TASK  SET  MANAGER  to  oontrol  the  execution  of  that  set  of  tasks. 

When  a  work  request  arrives,  the  TASK  SET  MANAGER  parses  the  work 
request  and  initiates  construction  of  the  task  graph  for  this  work  request. 
In  XJDPS.1,  only  a  single  copy  of  the  task  graph  is  maintained.  Thlt  copy  is 
stored  \t  the  node  where  the  TASK  SET  MANAGER  for  the  work  request  resides. 
At  this  stage  of  work  request  processing,  the  task  graph  contains  the  initial 
resouroe  requirements  for  the  work  request. 
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TASK  SET  MANAGER  -  1  per  user  terminal 
or  executing  command  file 

FILE  SYSTEM  MANAGER  -  1  per  node 

FILE  SET  MANAGER  -  1  per  node 

PROCESSOR  UTILIZATION  MANAGER  -  1  per  node 

PROCESSOR  UTILIZATION  MONITOR  -  1  per  node 

PROCESS  MANAGER  -  1  per  node 


Figure  1.  The  XFDPS.1  Model  of  Control 


In  the  next  step,  a  message  is  sent  to  the  FILE  SYSTEM  MANAGER  residing 
on  the  same  node  as  the  TASK  SET  MANAGER  requesting  file  availability  informa¬ 
tion  concerning  the  files  needed  by  the  work  request.  A  message  is  also  sent 
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to  the  PROCESSOR  UTILIZATION  MANAGER  residing  on  the  same  node  as  the  TASK  SET 
MANAGER  requesting  processor  utilization  information.  This  includes  the 
latest  utilization  information  that  this  particular  node  has  obtained  from  all 
other  nodes. 

When  the  file  availability  Information  and  processor  utilization 
information  arrive,  a  work  distribution  and  resource  allocation  decision  is 
made  by  the  TASK  SET  MANAGER.  At  this  point,  specific  files  are  chosen  from 
the  list  of  files  found  available  and  specific  processors  are  chosen  as  sites 
for  the  execution  of  the  various  tasks  of  the  work  request's  task  set.  In 
this  study  no  attempt  is  made  to  investigate  different  strategies  for 
distributing  work;  instead,  a  single  strategy  is  used  for  all  experiments. 
(Other  work  in  progress  in  the  FDPS  Research  Program  at  Georgia  Tech  is 
examining  the  complete  area  of  work  distribution  and  resource  allocation.)  In 
this  strategy,  a  process  is  assigned  to  execute  on  the  same  node  that  its 
object  code  resides.  Data  files  are  not  moved  but  accessed  from  the  node  on 
which  they  originally  resided. 

Once  the  allocation  decision  is  ar.de,  a  request  for  the  locking  of  the 
chosen  files  is  sent  by  the  TASK  SET  MANAGER  to  the  FILE  SYSTEM  MANAGER  resid¬ 
ing  on  the  same  node  as  the  TASK  SET  MANAGER.  The  desired  type  of  access 
(READ  or  WRITE)  is  also  passed  along  with  the  lock  request.  Multiple  readers 
are  permitted,  but  readers  are  denied  access  to  files  already  locked  for 
writing,  and  writers  are  denied  access  to  files  locked  for  reading  or  writing. 
If  the  FILE  SYSTEM  MANAGER  informs  the  TASK  SET  MANAGER  that  all  the  desired 
files  have  been  successfully  locked,  execution  of  the  work  request  can  be 
initiated.  If  the  locking  operation  is  not  successful,  the  work  request  is 
aborted,  and  the  necessary  cleanup  operations  are  performed.  The  next  step 
after  successful  file  allocation  is  to  send  a  series  of  messages  to  the 
PROCESS  MANAGERS  on  the  various  nodes  that  have  been  chosen  to  execute  the 
tasks  of  the  task  set  informing  them  that  they  are  to  execute  a  specific  sub¬ 
set  of  tasks. 

When  a  task  terminates,  its  PROCESS  MANAGER  reports  back  to  the  TASK  SET 
MANAGER  and  indicates  the  reason  for  the  termination  (normal  or  abnormal). 
When  an  indication  of  an  abnormal  termination  is  received,  the  remaining 
aotive  tasks  of  the  task  set  are  terminated. 
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After  all  tasks  of  a  task  set  have  terminated,  one  of  three  possibl  , 
actions  ooours.  If  the  source  of  oommands  is  a  user  terminal,  the  user  is 
prompted  for  a  new  command.  If  the  source  is  a  command  file,  the  next  command 
is  obtained.  Finally,  if  the  source  is  a  command  file  and  all  the  commands 
have  been  executed,  the  TASK  SET  MANAGER  is  deactivated  and  the  PROCESS 
MANAGER  on  the  node  where  the  command  file  was  being  executed  is  informed  of 
the  termination  of  the  command  file. 

3*1.2  File  System  newer 

Replicated  on  each  node  of  the  system  is  a  component  called  the  FILE 
SYSTEM  MANAGER.  This  module  handles  the  file  system  requests  from  all  of  the 
TASK  SET  MANAGERS  including  requests  for  file  availability  information  and 
requests  to  lock  or  release  files.  FILE  SYSTEM  MANAGERS  do  not  possess  any 
directory  information.  Therefore,  to  locate  a  file,  it  is  necessary  that  all 
nodes  are  queried  as  to  the  availability  of  the  file. 

The  FILE  SYSTEM  MANAGER  satisfies  the  requests  by  consulting  with  the 
FILE  SET  MANAGERS  (see  Section  3*1*3)  located  on  each  node  of  the  system.  For 
example,  when  the  FILE  SYSTEM  MANAGER  receives  a  request  for  file  availability 
information,  messages  are  prepared  and  sent  to  all  FILE  SET  MANAGERS.  The 
FILE  SYSTEM  MANAGER  collects  the  responses,  and  when  responses  from  all  FILE 
SET  MANAGERS  have  been  obtained,  it  reports  the  results  to  the  TASK  SET 
MANAGER  which  made  the  request.  Requests  for  the  locking  or  releasing  cf 
files  are  handled  in  a  similar  manner. 

3*1*3  File  Sat 

The  files  residing  on  each  node  of  the  system  are  managed  separately 
from  the  files  on  other  nodes  by  a  FILE  SET  MANAGER  that  is  dedicated  to 
managing  that  set  of  files.  The  duties  of  the  FILE  SET  MANAGER  include 
providing  file  availability  information  to  inquiring  FILE  SYSTB4  MANAGERS  and 
reserving,  locking,  and  releasing  files  as  requested  by  FILE  SYSTEM  MANAGERS. 
It  should  be  noted  that  a  side  effect  of  gathering  file  availability  informa¬ 
tion  is  the  placement  of  a  reservation  on  a  file  that  is  found  to  be 
available. 

3.1  .A  Process  Utilisation  Man»g»i» 

Also  present  on  eaoh  node  is  another  oomponent  of  the  executive  control, 
the  PROCESSOR  UTILIZATION  MANAGER.  This  module  is  assigned  the  task  of  col- 
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looting  and  storing  processor  utilization  information  which  is  obtained  from 
the  PROCESSOR  UTILIZATION  MONITORS  (see  Seotion  3.1.5)  residing  on  eaoh  of  the 
nodes.  When  a  TASK  SET  MANAGER  asks  the  PROCESSOR  UTILIZATION  MANAGER  for 
utilization  information,  the  PROCESSOR  UTILIZATION  MANAGER  responds  with  the 
data  available  at  the  time  of  the  query. 

3.1.5  ProMMor  BMJJaaMaa  Monitor 

Each  node  of  the  system  also  has  a  PROCESSOR  UTILIZATION  MONITOR  that  is 
responsible  for  oolleoting  various  measurements  needed  to  arrive  at  a  value 
describing  the  ourrent  utilization  of  the  processor  on  which  the  PROCESSOR 
UTILIZATION  MONITOR  resides.  The  processor  utilization  value  is  periodically 
transmitted  to  the  PROCESSOR  UTILIZATION  MANAGERS  on  all  nodes. 

3.1.6  Prooaaa 

Residing  on  eaoh  node  of  the  system  is  a  PROCESS  MANAGER  whose  function 
is  to  supervise  the  execution  of  processes  executing  on  the  node  on  which  it 
resides.  The  PROCESS  MANAGER  is  responsible  for  activating  and  deactivating 
processes.  If  the  execution  file  for  a  process  is  an  object  file,  the  PROCESS 
MANAGER  will  load  the  object  file  into  memory.  This  file  may  reside  either 
locally  or  on  a  distant  node.  If  the  execution  file  is  a  command  file,  the 
PROCESS  MANAGER  sees  that  a  TASK  SET  MANAGER  is  activated  to  respond  to  the 
commands  of  that  command  file.  The  PROCESS  MANAGER  is  also  responsible  for 
handling  process  termination.  This  involves  releasing  local  resources  held  by 
the  process  and  Informing  the  TASK  SET  MANAGER  that  requested  the  xecution  of 
the  process  as  to  the  termination  of  the  process. 

3.1.7  JEllft  PrQQftaa 

In  order  to  provide  file  aooess  in  a  manner  that  is  uniform  with  the 
operation  of  the  rest  of  tho  system,  another  type  of  control  process  is 
utilized,  the  PILE  PROCESS.  For  eaoh  aocess  to  a  file,  an  instance  of  a  FILE 
PROCESS  is  created.  Therefore,  if  proc  ss  *A"  is  accessing  file  "X"  and 
process  "B"  is  also  accessing  file  "X",  there  will  be  two  instances  of  a  FILE 
PROCESS,  eaoh  responsible  for  a  particular  access  to  file  ”X”.  Communication 
between  FILE  PROCESSes  and  user  processes  (file  reads  and  writes)  or  between 
FILE  PROCESSes  and  PROCESS  MANAGERS  (loading  of  object  programs)  is  handled  in 
the  same  manner  as  communication  between  user  processes. 
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3*2  THE  IFDPS.2 


*t»'  M't*) 


The  XFDPS. 2  model  of  eontrol  differs  from  the  XFDPS. 1  model  in  the  man¬ 
ner  in  which  file  management  is  oonduoted.  In  this  model  a  centralized  direc¬ 
tory  is  maintained.  In  Appendix  1  the  component  named  FILE  SYSTEM  MANAGER 
maintains  this  directory.  This  component  resides  on  only  one  node,  the  node 
where  the  file  system  directory  is  maintained.  TASK  SET  MANAGERS  communicate 
directly  with  thlj  component  in  order  to  gain  availability  information,  lock 
files,  or  release  files. 


When  a  file  is  looked  it  is  neoessary  to  oreate  a  FILE  PROCESS  in  order 
to  provide  acoess  to  the  file.  To  aooompllsh  this  task,  the  FILE  SYSTEM 
MANAGER  sends  a  message  to  the  node  where  the  file  resides  requesting  aotiva- 
tion  of  a  FILE  PROCESS  providing  access  to  the  file.  Once  this  process  is 
created,  the  FILE  SYSTEM  MANAGER  is  given  the  name  of  the  FILE  PROCESS  which 
it  then  returns  to  the  TASK  SET  MANAGER  that  requested  the  file  look. 


3.3  m&  XFDPS- 3  CONTROL  MODEL 

In  the  XFDPS. 1  model  of  oontrol  a  search  for  file  availability  informa¬ 
tion  encompassing  all  nodes  is  conducted  for  each  work  request.  Obtaining 
this  global  information  is  important  when  one  is  attempting  to  obtain  optimal 
resource  allocations.  In  those  instances  where  this  is  not  Important  a  slight 
variation  on  the  search  strategy  may  be  utilized.  This  strategy  is  the 
distinguishing  feature  of  the  XFDPS. 3  model  of  oontrol. 

Instead  of  immediately  embarking  on  a  global  search,  a  search  of  local 
resources  (i.e.,  resources  that  reside  on  the  same  node  where  tbr,  work  request 
originated)  is  oonduoted.  If  all  of  the  required  resources  are  looated,  no 
further  searches  are  conducted,  and  the  operations  of  locking  files,  activat¬ 
ing  process,  eta.,  described  for  model  XFDPS. 1  are  executed.  If  on  the  other 
hand  all  required  resources  oould  not  be  found,  the  strategy  of  model  XFDPS. 1 
is  utilized. 


3.4  2B&  XFDPS- 4  CONTROL  MODEL 

The  XFDPS. 4  model  of  oontrol  utilizes  a  file  management  strategy  similar 
to  that  of  the  ARAMIS  Distributed  Computer  System  [Caba79a,b]  in  whloh  mul¬ 
tiple  redundant  file  system  directories  are  maintained  on  all  nodes  of  the 
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system.  However,  sinoe  detailed  information  about  the  system  deaoribed  in 
[Caba79a,b]  is  not  available,  model  XFDPS.U  cannot  be  claimed  to  be  an 
aocurate  model  of  that  system. 

To  preserve  the  consistency  of  the  redundant  copies  of  the  file  system 
directory  and  to  provide  mutually  exclusive  aooess  to  resources,  the  following 
steps  are  taken.  A  control  message,  the  oontrol  vector  (CV),  is  passed  from 
node  to  node  according  to  a  predetermined  ordering  of  the  nodes.  The  holder 
of  the  CV  oan  either  release,  reserve,  or  lock  files.  Therefore,  each  node 
collects  file  system  requests  and  waits  for  the  CV  to  arrive.  Once  in  posses- 
slon  of  the  CV,  a  node  can  perform  the  actions  necessary  to  fulfill  the 
requests  it  has  collected. 

The  modifications  to  the  file  system  directory  are  then  placed  into  a 
message  called  the  update  vector  (UPV)  whioh  is  passed  to  all  nodes  in  order 
to  bring  all  copies  of  the  file  system  directory  into  a  consistent  state. 
When  the  UPV  returns  to  the  node  holding  the  CV,  all  updates  have  been  recor¬ 
ded,  and  the  CV  can  be  sent  on  to  the  next  node. 

3.5  IB&  XEHE&.5.  CONTROL  ififfiSL 

In  the  XFDPS.5  model,  files  are  not  reserved  when  the  initial 
availability  request  Is  made,  and  they  are  looked  only  after  the  work 
distribution  and  resouroe  allocation  deoision  has  been  made.  This  strategy 
leads  to  the  possibility  of  generating  an  allocation  plan  that  is  Impossible 
to  oarry  out  if  a  file  chosen  for  allocation  has  been  given  to  another  process 
during  the  interval  in  which  the  resource  allocation  decision  is  made.  In  the 
previous  models,  the  executive  control  is  assured  of  an  allocation  being 
acoepted,  assuming  no  component  fails. 

3.6  IH&  IFDPS.6  CONTROL  MODEL 

In  the  XFDPS.1  model,  the  task  graph  for  a  particular  work  request  is 
maintained  as  a  single  unit  and  stored  on  only  one  node,  the  node  at  which  the 
work  request  originates.  The  XFDPS.6  model  of  oontrol  utilizes  a  slightly 
different  strategy.  The  task  graph  is  constructed  on  a  single  node,  but  once 
a  work  distribution  and  resouroe  allocation  deoision  has  been  made,  portions 
of  the  task  graph  are  sent  to  various  nodes.  Specifically,  those  nodes  chosen 
to  exeoute  the  various  tasks  of  the  task  graph  are  given  that  portion  of  the 
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task  graph  for  whioh  they  are  responsible.  Each  node,  then,  must  activate  the 
tasks  assigned  to  It  and  oolleot  termination  information  oonoerning  those 
tasks.  When  all  tasks  assigned  to  a  particular  node  have  terminated,  the  node 
where  the  work  request  originally  arrived  is  informed  of  their  termination. 
One  oan  view  this  strategy  as  a  two-level  hierarchy. 
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SICTIOI  4 
THE  SIMULATOR 

In  order  to  obtain  quantitative  Information  concerning  the  relative  per¬ 
formance  of  the  various  models  of  oontrol,  simulation  experiments  are  conduc¬ 
ted.  The  goals  of  these  experiments  are  to  validate  the  models  of  oontrol 
desoribed  in  Section  3  and  gather  data  on  their  relative  performance.  In 
order  to  be  able  to  express  the  differences  between  the  various  models,  it  is 
necessary  that  the  simulator  provide  for  the  specification  of  relatively  low 
level  features  of  the  control  models. 

4.1  jBgQBaram  m  m.  a mpuiqb 

The  goals  desoribed  above  necessitate  the  establishment  of  several 
requirements  for  the  simulator.  In  order  to  handle  low  level  oontrol  problems 
and  dooument  solutions  to  these  problems,  the  oontrol  models  must  be  defined 
in  a  language  capable  of  dearly  expressing  the  level  of  detail  required  at 
this  stage  of  design.  Beoause  a  number  of  models  are  to  be  tested,  it  is 
important  that  the  coding  effort  for  these  models  be  minimized. 

It  is  expeoted  that  the  architecture  of  the  network  as  well  as  that  of 
individual  nodes  in  the  network  will  affeot  the  relative  performance  of 
various  oontrol  models.  Therefore,  one  must  be  able  to  easily  modify  various 
architectural  attributes.  This  Includes  network  connectivity,  network  link 
capacities,  and  the  capacities  and  processing  speeds  of  the  Individual  nodes 
of  the  network. 

Validation  of  control  models  is  one  of  the  primary  goals  of  the  simula¬ 
tion  studies.  To  achieve  this  goal  the  simulator  must  provide  the  ability  to 
establish  specific  system  states.  In  other  words,  speoific  detailed  instances 
of  work  requests  need  to  be  oonstruoted  along  with  the  establishment  of 
speclfio  resouroe  states  (e.g. ,  one  must  be  able  to  set  up  a  series  of  files 
in  specific  locations).  These  capabilities  allow  one  to  exeroise  speclfio 
features  of  the  control  models. 

The  simulation  studies  also  provide  performance  information.  The 
simulator  must  utilize  a  technique  for  generating  work  requests  reflecting 
specific  distributions.  It  also  needs  to  oolleot  a  variety  of  performance 
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measurements  and  generate  appropriate  statistical  results. 

4.2  1HB  aiROCIPBB  SI  WL  aamuxm 

The  simulator  is  event  based  and  programmed  in  Paaoal.  It  simulates  the 
hardware  components  of  an  FDPS,  functions  typioally  provided  by  local  operat¬ 
ing  systems,  funotions  provided  by  a  distributed  and  decentralised  control, 
and  the  load  plaoed  upon  the  system  by  users  attached  to  the  system  through 
terminals. 

4.2.1  Arohlfcaafcura  aimilafcad 

The  hardware  organisation  that  is  simulated  is  dapioted  in  Figure  2. 
The  complete  system  oonslsts  of  a  number  of  nodes  oonnected  by  half-duplex 
communication  links.  Eaoh  node  contains  a  CPU,  a  communications  controller, 
and  perhaps  a  number  of  disks.  Connected  to  each  node  are  a  number  of  user 
terminals.  The  disk  simulation  is  suoh  that  no  aotual  Information  is  stored; 
only  the  delays  experienced  in  performing  disk  input/output  are  considered. 
User  lnterprooess  communication  (IPC)  is  simulated  with  time  delays  but  no 
exohange  of  real  data  takes  plaoe.  However,  IPC  between  components  of  the 
executive  oontrol  involves  both  simulation  of  the  time  delays  involved  in  mes¬ 
sage  transfer  and  the  aotual  transfer  of  oontrol  information  to  another 
simulated  node. 

4.2.2  Operating  System 

Components  typically  found  in  local  operating  systems  are  also 
simulated.  These  Include  the  dispatcher  and  the  devioe  drivers.  The  looal 
operating  systems  are  multitasking  systems  with  eaoh  node  capable  of  utilizing 
a  different  time  slice.  User  processes  are  servioed  in  a  first  oome  first 
served  manner  and  oan  be  Interrupted  for  any  of  the  following  reasons:  1)  a 
oontrol  process  needs  to  execute  (user  prooess  is  delayed  until  the  oontrol 
prooess  releases  the  processor),  2)  the  user  prooess  exhausts  its  time  slioe 
(user  prooess  is  plaoed  at  the  end  of  the  READY  QUEUE),  3)  the  user  process 
attempts  to  send  or  reoeive  a  message  (user  prooess  is  placed  on  the  MESSAGE 
BLOCKED  QUEUE),  or  4)  the  user  prooess  terminates. 

The  processes  servioed  by  the  simulator  are  capable  of  performing  the 
following  actions:  compute,  send  a  message,  receive  a  message,  or  terminate. 
A  process  oan  aooess  a  file  by  oommunicating  with  a  FILE  PROCESS  which  is 
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Figure  2.  The  Architecture  Supported  by  the  Simulator  for  Each  Node 


activated  for  the  speoific  purpose  of  providing  acoess  to  the  file  for  this 
process.  FILE  PROCESSes  are  the  only  processes  that  initiate  any  disk 
activity.  As  far  as  a  user  prooess  is  oonoerned,  a  file  aooess  is  simply  a 
communication  with  another  prooess. 

The  following  prooess  queues  are  maintained:  READY  QUEUE,  DISK  WAITING 
QUEUE,  and  MESSAGE  BLOCKED  QUEUE*  (See  Figure  3.)  A  newly  activated  process 
is  plaoed  in  the  READY  QUEUE.  The  DISPATCHER  selects  a  process  from  the  READY 
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QUEUE  to  run  on  the  CPU.  If  the  running  process  exhausts  its  time  slice,  it 
is  returned  to  the  READY  QUEUE.  If  it  either  attempts  to  send  or  receive  a 
message,  it  is  placed  in  the  MESSAGE  BLOCKED  QUEUE  where  it  remains  until 
either  the  message  is  placed  in  the  proper  link  queue  (send  operation)  or  a 
message  is  received  (receive  operation).  After  leaving  the  MESSAGE  BLOCKED 
QUEUE,  a  process  returns  to  the  READY  QUEUE. 

The  only  processes  capable  of  performing  disk  input/output  on  the 
simulator  are  FILE  PROCESSes.  These  are  executive  control  processes  that  are 
assigned  to  provide  access  to  the  files  of  the  file  system.  When  a  file 
process  attempts  a  disk  access,  it  is  blocked  and  placed  in  the  DISK  WAITING 
QUEUE  for  processes  waiting  to  access  that  same  disk.  As  the  disk  requests 
are  satisfied,  these  processes  are  returned  to  the  READY  QUEUE, 

4.2.3  Message  System 

Th*>  communication  system  consists  of  a  series  of  half-duplex  connections 
between  pairs  of  nodes.  Messages  are  transmitted  using  a  store-and-forward 
method.  Messages  received  at  intermediate  nodes  in  a  path  are  stored  and  for¬ 
warded  to  the  next  node  at  a  time  dictated  by  the  communication  policy  being 
utilized.  For  example,  the  policy  may  require  that  the  new  message  be  placed 
at  the  end  of  the  queue  of  all  messages  to  be  transmitted  on  a  particular 
link.  (This  is  the  policy  utilized  in  all  experiments.) 

The  message  queues  available  on  each  node  are  depicted  in  Figure  4.  If 
a  newly  created  message  is  an  intranode  message,  it  is  placed  in  the  MESSAGE 
QUEUE;  otherwise,  it  is  placed  in  the  LINK  QUEUE  that  corresponds  to  the  com¬ 
munication  link  over  which  the  message  is  to  be  transmitted.  Messages  are 
removed  from  the  LINK  QUEUES  and  transmitted  as  the  communication  links  become 
available. 

Messages  in  the  MESSAGE  QUEUE  originate  either  from  processes  sending 
Intranode  messages  or  from  the  communication  links  connected  to  the  node. 
Messages  destined  for  processes  on  the  some  node  as  the  MESSAGE  QUEUE  are 
placed  in  the  appropriate  PORT  QUEUE  of  the  process  to  whioh  they  are  addres¬ 
sed,  Messages  that  have  not  yet  reached  their  destination  are  placed  in  the 
LINK  QUEUE  corresponding  to  the  communication  link  over  which  the  message  is 
to  be  transmitted. 
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Figure  3.  Prooess  Queues  on  Eaoh  Node 
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Figure  4.  Message  QUEUES  on  Each  Node 
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4.2.4  input  £on  mn.  aumintnc 

The  simulator  requires  the  following  six  types  of  input: 

1 .  Control  model 

2.  Network  configuration  (i.e.,  nodes  and  their  connectivity) 

3.  Work  requests 

4.  Command  files 

5.  Objeot  files 

6.  Data  files 

The  nature  of  these  inputs  and  how  they  are  provided  to  the  simulator  is 
described  below. 

4. 2. 4.1  Control  Model 

There  arc  two  possible  approaches  for  representing  the  control  model  in 
the  simulator:  1)  data  to  be  interpreted  by  the  simulator  and  2)  code  that  is 
actually  part  of  the  simulator.  The  first  technique  requires  that  the 
simulator  contain  or  include  a  rather  sophisticated  interpreter  in  order  to 
provide  a  convenient  language  with  which  one  can  express  a  control  model  that 
addresses  the  control  problems  to  a  sufficiently  low  level  cf  detail.  The 
second  technique  requires  the  careful  construction  of  the  simulator  such  that 
those  portions  of  the  simulator  that  express  the  control  model  are  easily 
identified  and  can  be  removed  and  modified  with  minimal  effort.  The  second 
technique  also  requires  a  recompilation  of  the  simulator  code  each  time  a 
control  model  modification  is  performed. 

The  problems  involved  in  constructing  a  sophisticated  interpreter  are 
much  greater  than  those  faced  In  organizing  the  simulator  so  that  the  portions 
of  code  expressing  the  control  model  are  easily  isolated.  Therefore,  in  this 
simulator,  the  oontrol  models  are  expressed  in  Pascal  and  are  actually  part  of 
the  simulator  rather  than  being  separate  input  to  the  simulator. 

4. 2 *4. 2  Network  Configuration 

The  attributes  provided  as  input  to  the  simulator  which  are  concerned 
with  the  physical  configuration  of  the  FDPS  are  provided  in  Table  2.  Figure  5 
describes  the  syntax  of  the  statements  used  to  enter  the  FDPS  configuration 
information.  Two  types  of  input  oan  be  provided,  node  configuration  informa¬ 
tion  and  communication  linkage  information.  Each  statement  beginning  with  the 
letter  'n'  describes  the  configuration  of  the  node  whioh  is  identified  by  the 
digit  following  the  'n'.  This  statement  describes  certain  characteristics 
concerning  the  processor  at  the  node  (pemqry  capaoity,  processing  speed,  and 
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the  length  of  a  user  time  slioe)  and  the  peripheral  devices  (user  tenuins.ls 
and  disks)  attached  to  the  processor.  Eaoh  statement  beginning  with  the  let¬ 
ter  *1*  describes  a  half-duplex  communication  link  between  two  nodes.  It 
identifies  the  source  and  destination  nodes  by  their  identification  number 
(the  digit  following  the  letter  'n'  on  statements  describing  nodes)  and 
indicates  the  effective  bandwidth  of  the  communication  link.  It  is  assumed 
that  all  messages  are  transmitted  at  this  speed,  and  no  attempt  is  made  to 
simulate  errors  in  transmission  and  the  resulting  retransmissions. 


Table  2.  Physical  Configuration  Input  to  the  Simulator 

Hate.  larpraatioa 

Memory  Capacity  (bytes) 

Processing  Speed  (Instructions/ sec) 

Size  of  a  Time  Slioe  ( mi oro seconds) 

Number  of  Attached  User  Terminals 

Number  of  Attached  Disks 

Disk  Transfer  Speed  (bytes/second) 

Average  Disk  Latency  (microseconds) 

.Ink  Tnfftr^tlon 

Identities  of  the  Source  and  Destination  Nodes 
Bandwidth  (bytes/ second) 


4. 2. 4. 3  Work  Requests 

Work  requests  are  assumed  to  originate  from  two  sources:  1)  directly 
from  a  user,  or  2)  through  oommand  files.  The  syntax  of  a  work  request  is 
given  in  Figure  6.  This  syntax  is  a  subset  of  the  command  language  available 
through  the  Advanced  Command  Interpreter  of  the  Georgia  Tech  Software  Tools 
System  [AkinBO],  •  • 

A  work  request  is  basically  a  specification  of  a  logical  network  of 
tasks.  The  nodes  of  the  logical  network  represent  tasks  and  the  links 
represent  communication  paths  between  the  tasks.  A  node  specification 
includes  the  following:  an  optional  label  to  identify  the  node,  a  command 
name  (this  may  name  either  an  objeot  file  or  a  command  file),  and  any  I/O 
redirection.  A  node  can  be  identified  either  by  its  label,  if  it  possesses 
one,  or  by  its  position  on  the  command  line.  For  example,  in  the  command 
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<entry>  <link>  |  <node> 

<link>  ::=  1  <from>  <to>  <bandwidth>  (all  links  are  half-duplex) 

<node>  ::=  n  <node  id>  <memory>  <speed>  <timeslioe>  <termlnals> 

<disk>  <disk  speed>  <disk  latenoy> 

<from>  ::=  <node  id> 

<to>  <node  id> 

<node  id>  <integer> 

<bandwidth>  <integer  (link  bandwidth  in  bytes  per  second )> 

<memory>  ::=  <integer  (main  memory  in  bytes)> 

< speed>  ::=  <integer  (average  speed  of  the  CPU  in  instructions  per  second)> 
<timeslice>  <integer  (microseconds) > 

<terminals>  ::=  <integer  (number  of  attached  user  terminals) > 

<disk>  :;=  <lnteger  (number  of  attached  disks) > 

<disk  speed>  ::=  <integer  (transfer  speed  of  disk  in  bytes/seo)> 

<disk  latenoy>  ::s  Cinteger  (average  disk  latency  in  microseconds) > 
<integer>  ::=  <digit>  {  <digit>  } 


Examples : 

n  1  256000  5000000  1000  50  3  500000  100 

(Node  #1  has  250K  bytes  of  memory,  processes  at  the  rate  of 
5  MIPS,  has  a  time  slice  of  1000  microseconds,  has  50  user 
terminals  attached  to  it,  has  3  disks  attached  to  it, 
eaoh  disk  can  transfer  at  the  rate  of  500  000  bytes/sec, 
and  eaoh  disk  has  an  average  latenoy  of  luO  microseconds.) 

156  4000000 

(This  link  oonnects  node  5  to  node  6  with  a  half-duplex 
communication  path  that  can  transmit  at  the  rate  of 
4  million  bytes/seo.) 


Figure  5.  Syntax  of  FDPS  Configuration  Input  for  the  Simulator 
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<work  request>  <logioal  net> 

< logical  net>  <logioal  node>  {  <node  aeparator> 

{  <node  separator>  }  <logioal  node>  } 

<node  separator>  ,  |  <pipe  oonneotion> 

<pipe  oonnection>  ::s  [  <port>  3  • j *  C  <logioal  node  number >  ] 

[  .<port>  ] 

<port>  ::a  <integer> 

<logical  node  number>  ::  =  <integer>  I  $  I  <label> 

<logioal  node>  [  :<label>  ]  <simple  node> 

<simple  node>  : : =  {  <i/o  redirector >  }  <oommand  name> 

{  <i/o  redireotor>  } 

<i/o  redirector>  <file  nane>  *>*  [  <port>  ]  1 

[  <port>  ]  •>•  <file  name>  I 
[  <port>  ]  *»»  <file  name>  I 
*»’  [  <port>  3 

< command  name>  : : =  <command  file  name>  I  <objeot  file  name> 

< label >  <identifier> 

<file  name>  ::s  <data  file  name> 

<identifier>  ::=  <letter>  {  <letter>  I  <digit>  } 

<integer>  <digit>  {  <digit>  } 

Examples: 

pgml  I  pgm2  1  la  2 lb  :a  pgm3  I  pgm4  | o. 1  :b  pgm5  !  pgm6  1.2  :c  pgm7 
(For  an  explanation  of  this  example  see  Figure  7.) 

Figure  6.  Work  Request  Syntax 
(Based  on  [AKIN803) 
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below,  the  seoond  node  has  the  label  'a'  and  the  oommand  name  * omnd2 * . 
onndl  |  :a  cand2 

This  node  oan  be  Identified  either  by  the  label  'a*  or  its  position  '2*  but 
not  by  its  name,  ' cmnd2 ' . 

I/O  redirection  is  used  *o  connect  ports  of  task  to  files  in  the  file 
system.  (The  default  for  I/O  is  "standard  input/output,"  l.e.,  the  user's 
terminal.)  In  the  example  below,  input  port  number  three  is  connected  to  file 
'in'  and  output  port  number  one  is  connected  to  file  'out'. 
in>3  omnd  1>out 

The  specification  of  the  port  number  in  the  I/O  redirector  is  optional.  If  it 
is  omitted,  the  next  unused  port  number  is  assumed.  Therefore,  in  the  example 
below,  output  port  number  one  is  connected  to  file  'outl',  output  port  number 
two  is  connected  to  file  'out2',  and  output  port  number  three  is  connected  to 
file  ' out3 ' . 

omnd  >out1  2>out2  ' out3 

Nodes  are  separated  by  node  separators  which  can  be  either  the  comma 

symbol  or  the  vertioal  bar  symbol.  The  oomma  symbol  is  used  to  separate  a 

node  that  does  not  have  any  output  ports  connected  to  any  other  nodes.  The 
vertioal  bar  symbol  or  pipe  symbol  is  used  to  identify  the  connection  of  an 
output  port  of  the  node  immediately  preceding  the  pipe  symbol  and  the  input 

port  of  another  node.  The  port  numbers  and  logical  node  number  of  the  pipe 

specification  may  be  omitted  and  default  value*  assumed.  If  a  port  number  is 
omitted,  the  next  unused  port  number  for  the  node  possessing  the  port  is  used. 
The  logical  node  number  of  the  pipe  specification  identifies  a  node  of  the 
logical  network.  It  may  either  be  an  integer  identifying  the  position  of  the 
:  on  the  command  line,  the  symbol  which  identifies  the  last  node  on  the 
oommand  line,  or  a  node  label.  If  no  other  node  is  specified,  the  node 
immediately  following  the  pipe  symbol  is  assumed  to  be  the  destination  of  the 
o?j':pvt  of  the  pipe. 

An  example  of  a  work  request  utilizing  this  syntax  is  shown  in  Figure  7. 
This  oommand  oonslsts  of  seven  logloal  nodes  connected  in  the  manner  depicted 
in  the  figure.  It  demonstrates  several  forms  of  pipe  specifications  Including 
the  use  of  labels  in  identifying  nodes. 
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Work  Request: 

pgnl  |  pgm2  It*  2 |b  :a  pg»3  I  pgm4  le.1  :b  pgm5  I  pgm6  |.2  :o  pgn»7 
(0)  (1)  (2)  (3)  (4)  (5)  (6)  (7)  (8)  (9) 

(0)  Output  port  1  of  pgnl  la  connected  to  input  port  1  of  pgn2. 

(1)  Output  port  1  of  pgm2  la  oonneoted  to  input  port  \  of  the 

logical  node  labeled  ”a,"  pgm3< 

(2)  Output  port  2  of  pgn2  la  oonneoted  to  input  port  1  of  the 
logical  node  labeled  "b, "  pgm5. 

(3)  Label  for  the  logical  node  containing  pgm3  as  ita  execution 
module. 

(4)  Output  port  1  of  pg&3  ia  oonneoted  to  input  port  1  of  pgn4. 

(5)  Output  port  1  of  pgm4  la  oonneoted  to  input  port  1  of  the 

logical  node  labeled  "c,”  pgm7. 

(6)  Label  for  the  logical  node  containing  pgm5  aa  ita  execution 
module. 

(7)  Output  port  1  of  pgn5  la  oonneoted  to  input  port  1  of  pgn6. 

(8)  Output  port  1  of  pgm6  ia  connected  to  input  port  2  of  pgm7. 

(9)  Label  for  the  logical  node  containing  pgm7  as  ita  execution 
module. 


Data  Flow  Graph  of  the  Work  Request: 
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Figure  7*  Example  of  a  Work  Request 
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In  order  to  simulate  the  load  generated  by  users  entering  work  requests 
from  user  terminals,  a  population  of  work  requests  is  oreated.  The  form  of 
the  input  for  oreating  the  work  request  population  is  provided  in  Figure  8. 
Eaoh  line  of  input  contains  a  series  of  node  Identifiers  followed  by  a  oolon 
whloh  is  followed  by  a  work  request.  The  node  identifiers  lndioate  which 
nodes  are  to  oontaln  the  given  work  request  as  a  member  of  the  node's  popula¬ 
tion  of  work  requests.  Therefore,  the  result  of  this  input  is  the  construc¬ 
tion  of  a  population  of  work  requests  for  eaoh  node.  In  a  subsequent 
paragraph,  the  nature  of  the  load  generator  is  disoussed  and  indicates  how 
this  information  is  utilized. 


<work  request  population>  : : a  <work  request  entry> 


<work  request  entry> 

<work  request  entry>  ;:=  {  Cnode  identified  }  :  <work  request> 
<node  identifier >  : :=  <integer> 

<work  request>  ;:r  (see  Figure  6) 

<!nteger>  ::=  <dlgit>  {  <digit>  } 

Examples: 

12345:  pgml  I  pgm2 
1  3  :  pgml 


{  the  work  request  'pgml  I  pgm2' 
is  available  on  nodes  1,  2,  3, 
4,  and  5  } 

{  the  work  request  ' pgml '  is 
available  on  nodes  1  and  3  } 


Figure  8.  Syntax  of  Work  Request  Population  Input  to  the  Simulator 


4. 2. 4. 4  Command  Piles 

Command  files  are  constructed  for  the  simulator  using  the  syntax 
described  in  Figure  9*  This  input  specifies  a  unique  name  for  the  file,  the 
simulated  node  at  whloh  the  file  resides,  and  the  oommands  contained  in  the 
file.  These  commands  oonform  to  the  syntax  of  work  requests  presented  in 
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Figure  6.  These  statements  provide  one  with  the  ability  of  oonstruoting  com¬ 
mand  files  on  particular  nodes  whioh  are  referenced  either  by  commands 
originating  from  user  terminals  or  other  oommand  files. 


<coomand  file>  C  <node  id>  <oommand  file  name> 

{  <work  request >  } 

ENDC 

<node  id>  <integer> 

<command  file  name>  ::=  <up  to  8  oharacters> 

<work  request>  : : =  ( see  Figure  6 ) 

<integer>  ::s  <digit>  {  <digit>  } 


Examples: 

C  1  ofilel 

Pgml  I  pgm2  1 |a  2|b  :a  pgm3  |  pgm4  io.1  :b  pgm5  i  pgm6  1.2  :c  pgm7 

pgml  I  pgm5 

ENDC 


Figure  9.  Syntax  of  Command  File  Input  to  the  Simulator 


4.2. 4.5  Objeot  Files 

Figure  10  depicts  the  syntax  used  to  express  objeot  files  in  the 
simulator.  The  input  specifies  a  unique  name  for  the  file,  the  simulated  node 
at  which  the  file  resides,  the  length  of  the  file  in  bytes,  and  the  simulation 
script.  The  script  oontalns  a  series  of  statements  that  describe  the  process 
actions  that  are  to  be  simulated.  There  are  five  actions  which  can  be 
simulated:  1)  oompute,  2)  receive  a  message,  3)  send  a  message,  4)  loop  back 
to  a  previous  command  a  speoific  number  of  times,  and  5)  terminate  the  prooess 
simulation.  By  appropriately  combining  these  oommands,  one  oan  construct  a 
script  whioh  simulates  the  activities  of  a  given  user  prooess. 

4. 2. 4 .6  Data  Files 

Data  files,  depicted  in  Figure  11,  are  the  final  type  of  file  which  oan 
be  presented  to  the  simulator.  The  data  file  input  oontalns  an  identifying 
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<objeot  file>  0  <node  id>  Cobjeot  file  name>  <objeot  file  length> 
{  <aotion>  } 

EH  DO 

<node  id>  <integer> 

<objeot  file  name>  ::s  <up  to  8  oharaotera> 

<objeot  file  length>  <integer> 

<aotion>  ::s  <comp>  I  <loop>  I  <rov>  I  <aend>  |  <tera> 

<oomp>  ::a  o  <#  of  instruotiona> 

<loop>  l  <inatruotion  #>  <oount> 

<rov>  ::=  r  <port> 

<send>  ::a  s  <port>  <size  (bytea)> 

<term>  j:a  t 

<#  of  instruotions>,  <inatruotion  #>,  <oount>,  <port>, 

<size>  ::a  <integer> 

<integer>  <digit>  {  <digit>  } 


Examples: 


0  1  objeotl  1000 
c  25 
1  1  10 
r  2 

a  M  100 
t 

ENDO 


(object  file  is  1000  bytes  long) 

(simulate  25  computation  instructions) 

(loop  baok  to  the  first  instruction  10  times) 
(read  a  message  from  port  2) 

(send  a  message  of  100  bytes  in  length  to  port  4) 
(terminate  the  execution  of  this  process) 


Figure  10.  Syntax  of  Object  File  Input  to  the  Simulator 


name,  a  node  identification  indicating  the  file's  simulated  location,  and  a 
specification  of  the  file  size.  Data  is  not  actually  stored  by  the  simulator. 

4,2.5  Sinlator  Design 

The  simulator  is  composed  of  several  modules.  In  each  module,  closely 
related  data  structures  and  the  procedures  that  modify  these  data  structures 
are  defined.  The  only  aooess  to  the  data  structure  is  through  these 
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<data  file>  ::»  D  <node  ld>  <data  file  name>  <aize> 

<node  ld>  s:s  <mteger> 

<data  file  name>  ::=  <up  to  8  oharaotera> 

<size>  ::r  <lnteger  ( bytes )> 

<integer>  ::a  <digit>  {  <digit>  } 

Examples: 

D  3  testflle  100000  (defines  a  data  file  named  'testfile' 

which  will  reside  on  node  3  and  will 
oontain  100,000  bytes  of  information) 

Figure  1 1 .  Syntax  of  Data  File  Input  to  the  Simulator 


procedures.  This  design  allows  one  to  isolate  the  portion  of  the  simulator 
that  represents  the  model  of  control  and  conduct  experiments  with  various 
perturbations  of  the  control  model.  Without  this  type  of  design,  each  pertur¬ 
bation  could  easily  require  significant  ohanges  to  the  entire  simulator.  The 
major  modules  of  the  simulator  are  described  below. 

4 .2 .5.1  Node  Module 

The  NODE  MODULE  simulates  the  hardware  activities  of  each  node  (e.g. , 
the  processor  and  attaohed  disks).  This  includes  the  simulation  of  user 
activities  as  specified  by  prooess  scripts  and  the  simulation  of  disk  traffic. 
In  addition,  this  module  provides  the  local  operating  system  functions  of 
dispatohing,  blocking  processes  for  message  transmission  or  reception,  and 
unblocking  processes. 

4 .2.5 .2  Message  System 

All  activities  dealing  with  messages  are  handled  by  the  MESSAGE  SYSTB4. 
Among  the  servioes  provided  by  this  module  are  the  following:  1)  routing  of 
messages,  2)  placement  of  messages  in  LINK  QUEUES,  3)  transmission  of  messages 
aoross  a  link,  4)  transmission  of  acknowledgement  signals  to  the  source  end  of 
a  link,  and  5)  placement  of  messages  in  PORT  QUEUES. 
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4. 2. 5. 3  Pile  System 

The  FILE  SYSTEM  stores  the  various  types  of  files,  which  include  object, 
command,  and  data  files.  It  stores  the  scripts  for  objeot  files  and  provides 
access  to  the  scripts.  Similarly  for  command  files,  it  stores  the  work 
requests  for  each  command  file  and  oontrols  aooess  to  the  file.  It  maintains 
directories  that  provide  looation  information  and  access  control  information. 
All  executive  control  actions  pertaining  to  the  file  system  are  contained  in 
this  module. 

4. 2. 5. 4  Command  Interpreter 

The  COMMAND  INTERPRETER  parses  work  requests  and  constructs  the  task 
graph  describing  the  initial  resource  requirements  for  a  work  request. 

4. 2. 5. 5  Task  Set  and  Prooess  Manager 

The  TASK  SET  AND  PROCESS  MANAGER  performs  all  control  activities 
required  to  manage  all  phases  of  execution  of  a  work  request.  This  includes 
activating  the  COMMAND  INTERPRETER;  communicating  with  the  FILE  SYSTEM  in 
order  to  gather  information,  allocate  files,  or  deallocate  files;  perform  work 
distribution  and  resource  allocation;  and  manage  active  processes. 

4. 2. 5. 6  Load  Generator 

Work  request  traffic  originating  from  the  user  terminals  attached  to 
each  node  is  created  by  the  LOAD  GENERATOR.  A  series  of  work  requests 
provided  by  a  user  at  a  terminal  is  called  a  user  session.  To  simulate  a  user 
session,  the  LOAD  GENERATOR  randomly  chooses  a  session  length  from  a  user 
specified  interval.  A  session  starting  time  (measured  in  seconds)  is  also 
chosen  at  random  from  a  user  specified  interval.  Eaoh  work  request  for  the 
user  session  is  chosen  at  random  from  the  population  of  work  requests 
originally  created  for  each  node  via  the  input  statements  described  above  (see 
Figure  8).  The  LOAD  GENERATOR  also  simulates  the  "think  time"  between  work 
requests  by  randomly  ohoosing  a  time  (measured  in  seoonds)  from  a  user 
specified  interval. 

4.2.6  ParfnPMnnn  Measurements 

Performance  measurements  are  made  concerning  three  types  of  data:  1) 

the  quantity  of  message  traffic,  2)  the  magnitudes  of  various  queue  lengths 
and  their  associated  waiting  times,  and  3)  the  size  of  average  work  request 
response  times  and  throughput. 
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To  identify  the  impact  of  the  executive  oontrol  on  the  communication 
system,  various  communication  measurements  are  obtained.  A  cumulative  total 
of  the  number  of  user  messages  and  control  messages  over  the  entire  system  is 
maintained.  This  allows  one  to  compare  the  number  of  control  messages  to  the 
number  of  user  messages  and  thus  Identify  how  the  communication  system  is 
being  utilized.  In  addition,  a  count,  again  categorized  by  user  messages  and 
control  messages,  is  maintained  in  matrix  form  to  identify  the  total  number  of 
messages  originating  at  a  particular  node  and  destined  for  every  other  node. 
Traffic  counts  on  each  communication  link  are  also  recorded  according  to  their 
classification  ao  user  messages  or  control  messages.  Finally,  activity  in  the 
LINK  QUEUES,  where  messages  wait  to  be  transmitted  over  each  link,  is 
maintained.  These  measurements  include  minimum  queue  length,  maximum  queue 
length,  average  queue  length,  minimum  waiting  time  in  the  queue,  maximum  wait- 
ing  time,  and  average  waiting  time. 

In  addition  to  measurements  concerned  with  the  LINX  QUEUEs,  a  similar 
analysis  of  process  queues  is  performed.  The  queues  on  each  node  that  are 
analyzed  are  the  READY  QUEUE  (processes  waiting  for  access  to  the  CPU),  MES¬ 
SAGE  BLOCKED  QUEUE  (processes  that  are  either  waiting  to  place  a  message  in  a 
LINK  QUEUE  o*  processes  waiting  to  receive  a  message),  and  DISK  WAITING  QUEUES 
(processes  waiting  for  access  to  a  particular  disk).  The  types  of 
measurements  obtained  are  identical  to  those  for  the  LINK  QUEUES. 

To  identify  the  effectiveness  of  the  control  strategy,  measurements  are 
obtained  that  identify  how  effectively  user  processing  is  accomplished.  For 
each  node  and  cumulatively  for  all  nodes,  the  following  measurements  are 
obtained  for  user  sessions,  work  requests,  and  processes: 

1.  The  total  number  of  user  sessions,  work  requests,  and  proces- 
ses. 

2.  The  average  servioe  time  for  each  user  session,  work  request, 
and  process. 

3.  The  average  response  time  for  eaoh  user  session,  work  request, 
and  process. 

4.  The  throughput  for  user  sessions,  work  requests,  and  processes. 
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SECTION  5 

THE  SIMULATION  EXPERIMENTS 


In  the  second  phase  of  experimentation  two  groups  of  simulation 
experiments  designed  to  measure  the  performance  of  the  various  models  in  an 
FDPS  environment  are  conducted.  In  addition,  a  number  of  experiments  are  con¬ 
ducted  with  a  single  node  network.  In  the  first  group  of  FDPS  experiments, 
only  one  work  request  is  processed  by  the  entire  network.  The  intent  of  this 
set  of  experiments  is  to  determine  the  minimum  delay  experienced  by  a  work 
request  with  each  model  of  control.  In  the  second  group  of  experiments,  a 
load  is  placed  on  all  nodes.  These  studies  are  designed  to  examine  the 
behavior  of  the  various  models  of  decentralized  control  operating  in  a  produc¬ 
tion  mode  with  various  physical  Interconnection  topologies.  The  single  node 
experiments  provide  a  means  of  comparing  the  performance  of  an  FDPS  to  that  of 
isolated  uniprocessors. 


5.1  THE  SIMULATION  ENVIR 


tJ.i.ia'K 


£ 


The  environment  in  all  FDPS  experiments  consists  of  a  network  of  five 
nodes  interconnected  In  various  ways  providing  five  different  interconnection 
topologies:  1)  a  unidirectional  ring,  2)  a  bidirectional  ring,  3)  a  star,  4) 
a  fully  connected  network,  and  5)  a  tree.  (See  Figure  12.)  The  nodes  of  each 
network  (see  Figure  2)  are  all  homogeneous,  and  each  consists  of  a  processor 
capable  of  executing  one  million  instructions  per  second.  Connected  to  each 
node  are  ten  user  terminals  and  three  disk  drives.  The  disks  are  assumed 
identical,  each  with  an  average  latency  of  100  microseconds  and  a  transfer 
rate  of  500,000  bytes  per  second. 


5.1.1  Environmental  Variables 

In  addition  to  different  topologies,  the  bandwidth  of  the  communication 
links  and  the  model  of  control  are  also  varied  for  the  experiments.  Table  3 
provides  a  brief  comparison  of  the  various  models.  Only  the  first  four  models 
of  control  (XFDPS.1,  XFDPS.2,  XFDPS.3,  and  XFDPS.4)  are  utilized  in  these 
Initial  experiments.  Models  XFDPS.5  and  XFDPS.6  differ  from  model  XFDPS.1  in 
details  that  are  not  examined  in  these  experiments.  Therefore,  they  are  not 
considered  in  these  experiments  because  their  observable  results  will  be 
identical  to  those  of  XFDPS.1.  It  is  instructive,  though,  to  note  that  not 
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Figure  12.  Network  Interconnection  Topologies 
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all  model  variations  result  in  performance  differences.  Finally,  it  should  be 
noted  that  the  central  directory  of  model  XFDPS.2  is  maintained  on  node  1  in 
all  experiments. 

5.1.2  Environmental  Constanta 

Several  environmental  features  remain  constant  for  all  experiments.  In 
all  cases,  it  is  assumed  that  all  control  messages  are  50  bytes  long.  All 
control  models  utilize  the  same  policy  for  distributing  work  and  allocating 
resources.  This  policy  simply  requires  all  processes  to  execute  on  the  node 
where  the  object  code  for  that  process  resides.  There  is  only  one  copy  of  the 
object  code  for  each  process  in  the  network  for  these  Initial  experiments. 
The  work  distribution  and  resource  allocation  policy  utilized  for  these  tests 
requires  that  data  files  be  accessed  at  the  location  where  they  originally 
reside  and  not  be  moved  prior  to  execution.  In  every  experiment,  all  files 
are  unique  thus  leaving  the  control  with  only  one  resource  allocation  alter¬ 
native. 

The  work  requests  arriving  at  all  nodes  are  of  the  type  'in>  cmnd'.  The 
data  file  'in’  provides  input  to  the  process  resulting  from  the  loading  of  the 
object  file  'cmnd'.  This  provides  an  environment  in  which  files  are  accessed 
only  by  means  of  reads  thus  eliminating  the  possibility  that  certain  work 
requests  are  either  delayed  or  aborted  due  to  insufficient  resources. 
Therefore,  it  is  guaranteed  that  all  control  activity  results  in  the  success¬ 
ful  completion  of  a  work  request. 

In  all  cases,  the  object  file  ’cmnd*  and  data  file  'in'  are  located  on 
the  same  node.  This  means  that  all  file  accesses  are  looal  file  accesses  and 
thus  control  message  traffic  is  free  of  competition  by  user  messages  for  com¬ 
munication  resources.  This  provides  an  environment  in  which  the  effects  of 
the  control  models  can  be  more  directly  observed  without  the  influence  of  an 
unpredictable  collection  of  user  messages. 

The  objeot  files  in  each  oase  specify  the  execution  of  the  same  script 
whioh  is  depioted  in  Figure  13.  This  script  describes  a  process  that  alter¬ 
nately  computes  and  reads  from  a  data  file  for  501  iterations.  Given  the 
speed  of  the  processors  utilized  in  the  experiments,  this  results  in  a  CPU 
utilization  of  approximately  5  seoonds  for  each  process. 
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Table  3*  Comparison  of  the  Control  Models 
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{  10,000  oomput.e  instructions  } 

{  read  from  port  1  } 

{  loop  baok  to  instruction  one  500  times  } 
{  terminate  the  prooess  } 


Figure  13.  The  Soript  Utilized  By  All  Processes 


5.2  fiBQflfc  i  amis 

5.2.1  The 

The  first  group  of  experiments  is  designed  to  demonstrate  the  minimum 
delay  experienced  by  a  single  work  request  as  a  result  of  utilizing  each  model 
of  control.  In  this  set  of  experiments,  all  topologies  are  investigated  in 
addition  to  various  bandwidths  ranging  from  1200  to  500,000  bytes  per  second. 
These  experiments  examine  situations  in  which  work  requests  arrive  at  both 
nodes  1  and  2.  In  addition,  the  location  of  the  object-data  file  pairs  named 
in  the  work  request  are  varied  over  all  five  nodes. 

Each  of  these  tests  requires  the  simulator  to  process  only  one  work 
request,  thus  eliminating  competition  for  resources  by  other  work  requests. 
The  work  request  response  times  for  each  environment  (model,  topology,  band¬ 
width,  and  location  of  objeot-data  file  pair)  are  provided  in  Appendix  2.1. 

5.2.2  Qbaervationa 

A  comparison  of  the  results  of  this  set  of  experiments  can  be  seen  in 
Figures  14  and  15.  In  Figure  14,  the  results  of  work  requests  arriving  at 
node  1  can  be  seen.  Node  1  is  chosen  in  order  to  demonstrate  how  XFDPS.2  (the 
model  with  a  centralized  file  system  directory  located  on  node  1)  can  benefit 
from  the  looatlon  of  a  work  request.  In  all  cases,  model  XFDPS.2  provides  the 
smallest  response  times.  When  the  work  request  arrives  at  another  node  (e.g., 
node  2)  XFDPS.2  no  longer  provides  the  minimum  response  time  in  all  cases. 

The  sensitivity  of  XFDPS.2  to  the  location  of  the  work  request  can  be 
attributed  to  the  location  of  the  central  file  system  directory  on  node  1 .  If 
a  work  request  arrives  at  node  1,  all  resource  allocation  can  be  performed 
without  requiring  the  transmission  of  ftny  oontrol  messages.  The  only  control 
messages  needed  are  those  necessary  to  activate  the  file  processes  for  each 
file  named  in  the  work  request.  These  messages  are  transmitted  once  the  files 
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i  =  j  means  response  time  using  model  i  is  similar  to  that  using  j 

Figure  14.  Comparison  of  the  Response  Times  for  Models  1f  2,  3»  and  4 
that  Were  Obtained  from  the  Group  1  Experiments  in  Which 
Work  Requests  Arrived  at  Node  1 
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Figure  15.  Comparison  of  the  Response  Times  for  Models  1,  2,  3,  and  4 
that  Were  Obtained  from  the  Group  1  Experiments  in  Which 
Work  Requests  Arrived  at  Node  2 
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have  been  allocated.  If  the  work  request  arrives  at  node  2,  a  message  must  be 
sent  to  node  1  In  order  to  allooate  the  resources.  Onoe  the  resources  have 
been  allocated,  the  messages  to  activate  file  processes  can  be  sent. 
Therefore  a  two  stage  operation  with  two  sets  of  messages  results  from  this 
scenario. 

XFDPS. 1  and  XFDPS.3  provide  an  alternate  strategy  which  explains  their 
superior  performance  to  XFDPS. 2  when  the  work  request  arrives  at  node  2.  In 
these  models,  file  allocation  and  file  process  activation  are  accomplished 
with  one  message  because  the  directory  for  a  file  and  the  file  itself  reside 
on  the  same  node.  Therefore,  once  a  file  has  been  allocated,  the  file  process 
can  be  activated  with  an  intranode  operation. 

In  all  but  two  cases,  XFDPS. 4  results  in  the  largest  response  time  of 
all  the  models.  Only  when  the  work  request  arrives  at  node  2  in  a  network 
consisting  of  a  unidirectional  ring  with  a  bandwidth  of  1200  bytes  per  second 
does  this  model  perform  better  than  the  other  models.  This  particular 
topology  provides  the  longest  paths  between  nodes  thus  making  it  quite  suscep- 
table  to  communication  problems.  Model  XFDPS. 4  performs  better  at  low  band- 
widths  than  the  other  models  for  this  particular  topology  because  only  one 
message  is  present  on  the  communication  net  once  a  work  request  is  being 
processed.  During  the  resource  allocation  phase,  the  update  vector  (UPV)  cir¬ 
culates  about  the  ring;  and,  after  this  step,  the  control  vector  (CV)  is 
present  on  the  ring.  In  all  other  models,  multiple  messages  are  utilized  to 
process  a  work  request;  thus,  at  low  bandwidths,  message  throughput  becomes  a 
problem. 

Finally,  the  outstanding  performance  of  XFDPS.3  when  the  object  and  data 
files  named  in  a  work  request  reside  on  the  same  node  as  the  work  request 
should  be  noted.  This  is  a  clear  demonstration  of  the  savings  possible  with 
this  policy.  One  should  also  note  that  the  performance  of  XFDPS. 1  and  XFDPS.3 
are  identical  when  the  named  files  are  on  a  node  different  than  the  one 
receiving  the  work  request. 

5.3  GROUP  2  EXPERIMENTS 

The  first  set  of  experiments  demonstrates  fundamental  differences  in  the 
performance  of  the  models  when  handling  individual  work  requests,  but  this 
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type  of  experiment  can  often  be  deceiving.  When  multiple  work  requests  are 
processed  concurrently,  the  simultaneous  demands  on  resources  can  result  in 
unexpected  delays  which  cannot  be  anticipated  with  the  data  obtained  from  the 
first  set  of  experiments. 

5*3*1  The  Environment 

The  goal  of  this  set  of  experiments  is  to  simulate  and  examine  a  produc¬ 
tion  environment.  It  would  be  desirable  to  establish  identical  loads  for  all 
experiments,  but  the  nature  of  the  problem  makes  this  impossible.  The  basic 
environment  consists  of  a  network  of  five  nodes  with  ten  user  terminals 
attached  to  each  node.  To  provide  an  identical  load,  one  would  have  to 
guarantee  that  the  work  requests  will  be  presented  to  the  simulator  in  the 
same  order  for  each  experiment.  The  control  models,  though,  are  composed  of 
autonomous  components  and  by  their  design  will  process  work  requests  on  each 
node  at  different  rates  as  demonstrated  by  the  results  of  the  group  1 
experiments.  This  Implies  that  even  if  the  work  requests  at  aaeh  node  are 
presented  in  the  same  order,  the  load  provided  to  the  simulator  will  be 
different  because  the  timing  of  work  request  arrivals  may  vary. 

To  clarify  this  point,  oonsider  the  following  example.  Assume  the  loads 
provided  to  nodes  1  and  2  are  as  shown  in  Figure  16.  This  figure  depicts  the 
order  in  which  the  work  requests  arrive  at  each  node.  Because  the  control 
models  process  work  requests  at  different  rates,  different  processing 
sequences  are  obtained  for  the  control  models.  Figure  17  depicts  the  sequence 
for  model  1  and  Figure  18  depicts  that  for  model  2.  Thus,  although  the  loads 
at  each  node  are  controlled,  it  is  impossible  to  control  the  sequence  of  work 
requests  on  all  nodes  collectively. 


Load  at  Node  1 


Load  at  Node  2 


WR1 

WR2 

WR3 

WR4 


WR5 

WR6 

WR7 

WR8 


Figure  16.  Example  of  Loads  Presented  to  Two  Nodes 
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Figure  17. 

Sequence  of  Work  Request 

Arrivals  When  Using  Model 

Node  1 
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WR6 

WR7 

WR8 

Time - 

Figure  18.  Sequence  of  Work  Request  Arrivals  When  Using  Model  2 


Since  identical  loads  cannot  be  provided,  we  attempt  to  construct  an 
unbiased  load.  Eaoh  terminal  issues  its  first  work  request  at  a  time  measured 
in  seconds  corresponding  to  an  integral  value  chosen  at  random  from  the  inter¬ 
val  Cl,  15].  After  a  work  raquest  has  completed,  the  arrival  time  (measured 
in  seconds)  of  the  next  work  request  from  the  terminal  is  again  chosen  by 
selecting  a  random  value  in  the  interval  [1,  15]  as  the  delay  from  the 
termination  of  the  previous  work  request.  The  work  requests  are  chosen  at 
random  from  a  common  pool  of  work  requests.  Each  work  request  in  the  pool  is 
of  the  type  described  earlier  in  section  5.1.2  naming  object-data  file  pairs 
in  which  both  the  object  file  data  file  reside  on  the  same  node.  There  is 
an  equal  number  of  object-data  file  pairs  on  each  node.  Therefore,  the 
probability  that  a  newly  arrived  work  request  names  an  object-data  file  pair 
residing  on  node  i  is  1/5  for  i  =  1,  5.  . 

In  order  to  obtain  steady  state  data,  the  taking  of  measurements  is 
delayed  until  a  simulation  time  of  30  seconds  after  the  start  of  the  test. 
This  insures  that  all  terminals  are  active  and  are  into  their  normal 
activities.  Measurements  are  then  taken  until  330  seconds  into  the  simulation 
thus  providing  a  measurement  interval  of  5  minutes.  This  provides  observation 
of  the  processing  of  over  200  work  requests.  Longer  simulation  intervals, 
though  desirable,  are  not  practical  due  to  the  extensive  computation  necessary 
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to  simulate  the  level  of  detail  provided  by  the  control  models  being  examined. 
It  has  been  observed  for  most  runs  that  over  three  hours  of  oomputing  time  on 
a  Prime  550  are  required.  (The  performance  of  the  Prime  550  is  approximately 
80%  of  that  of  an  IBM  370/158  and  35*  of  that  of  a  VAX  11/780  [Henk8l].)  Over 
160  simulation  runs  have  been  made  during  the  process  of  this  research. 

In  this  set  of  experiments,  the  following  three  factors  are  varied:  1) 
control  model,  2)  topology,  and  3)  bandwidth.  Experiments  utilizing  all  pos¬ 
sible  combinations  of  these  factors  are  run.  The  results  of  these  experiments 
are  provided  in  Appendix  2.2. 

5.3.2  flbflflmtlQBB 

The  most  distinguishing  feature  of  the  results  of  these  tests  is  the 
lack  of  significant  variation  in  average  response  time  for  experiments  utiliz¬ 
ing  all  models  and  topologies  with  bandwidths  1200  bytes  per  second  or  larger. 
In  all  cases,  the  LINK  QUEUES  have  an  average  length  of  between  one  and  two 
messages,  .Implying  that  the  communication  system  does  not  prove  to  be  a  bott¬ 
leneck. 

To  demonstrate  that  the  values  for  average  response  time  could  be 
explained  by  delays  due  to  the  intranode  multitasking  of  processes, 
experiments  utilizing  the  extremely  high  bandwidth  of  2.5  million  bytes  per 
second  are  conducted.  The  results  are  very  similar  to  those  obtained  with 
much  lower  transmission  rates.  In  addition,  a  simulation  of  a  single  node 
network  is  conducted.  This  also  results  in  an  average  response  time  not 
significantly  different,  (The  results  of  the  single  node  simulation  are 
provided  in  Appendix  2.3.) 

In  most  cases  when  the  bandwidth  is  lowered  to  values  below  600  bytes 
per  second,  a  statistically  significant  increase  in  response  time-i  is  obser¬ 
ved.  In  most  cases,  either  XFDPS.2  or  XFDPS.4  provided  the  smallest  average 
response  time  values.  It  is  neoessary,  though,  to  reduce  the  bandwidth  to 
extremely  low  values  in  order  to  observe  these  differences,  thus  leading  us  to 
conclude  that  as  far  as  constrasting  the  various  models  is  concerned,  the  data 
is  rather  inconclusive. 

Finally,  the  results  of  the  experiments  with  model  XFDPS.2  provide  one 
further  observation.  Recall  that  in  this  model  a  single  centralized  file 
system  directory  is  maintained.  All  file  system  requests  are  handled  by  the 
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node  housing  this  directory.  Therefore,  one  would  expect  the  performance  of 
this  node  to  be  somewhat  degraded  due  to  the  oontrol  aotivity  required  to 
satisfy  the  file  system  requests.  The  results,  though,  show  that  this  is  not 
the  oase.  The  average  response  times  for  work  requests  arriving  at  the  node 
where  the  central  directory  is  maintained  (node  1)  do  not  differ  significantly 
from  those  on  other  nodes.  This  result  implies  that  the  amount  of  file  system 
management  work  is  rather  negligible,  thus,  it  does  not  lead  to  any  per¬ 
formance  degradation. 

5.4  SINGLE  NODE  NETWORK  EXPERIMENTS 

5.4.1  The  Environment 

This  set  of  experiments  is  considered  separately  from  those  described 
above  because  its  purpose  is  not  to  analyze  the  relative  performance  of  the 
control  models.  These  experiments  are  designed  to  provide  a  standard  upon 
which  the  other  results  can  be  compared  in  order  to  determine  the  impact  of 
distributed  processing  on  average  response  time  for  work  requests. 

The  configuration  of  the  single  node  comprising  the  network  in  this  set 
of  experiments  is  identical  to  that  for  each  node  in  the  other  experiments. 
The  work  requests  name  object-data  file  pairs  and  the  script  for  the  o  ject 
file  Is  the  same  as  that  employed  in  the  first  two  groups  of  experiments. 
Since  there  is  no  internode  communication,  the  choice  of  the  control  model  is 
of  no  consequence,  and  therefore  XFDPS.1  is  arbitrarily  selected. 

5.4.2  QhaqrYattQBa 

Five  simulations  are  conducted  and  the  results  of  those  runs  are 
presented  in  Appendix  2.3.  The  values  for  average  response  time  from  these 
experiments  are  similar  to  those  found  in  the  first  group  of  experiments  when 
bandwidths  greater  than  600  bytes/ sec  are  used. 
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SBCTION  6 
CONCLUSIONS 

6.1  QUALITATIVE  ASPECTS  1 flfi  MODELS 

The  evaluation  of  the  control  models  would  be  incomplete  if  considera¬ 
tion  were  given  only  to  the  quantitative  results  provided  by  the  simulation 
experiments.  It  is  also  important  to  examine  certain  qualitative  aspects  of 
the  models  which  were  not  quantitatively  evaluated.  These  aspects  include  the 
ability  to  provide  fault- tolerant  operation  (e.g.,  graceful  degradation  and 
restoration),  the  ability  for  the  system  to  expand  gracefully,  and  the  ability 
to  balance  the  system  load. 

6.1.1  IFDPS.1 

The  XFDPS.1  model  is  a  truly  distributed  and  decentralized  model  of 
control.  In  this  model,  resources  are  partitioned  along  node  boundaries  and 
managed  by  components  residing  on  the  same  node  as  the  resource.  This  design 
enables  the  system  to  remain  in  operation  in  the  presence  of  a  failure.  In 
such  a  situation,  those  nodes  not  available  are  simply  not  contacted  when 
queries  concerning  resources  are  made.  The  failed  nodes  are  also  not 
considered  as  locations  for  the  execution  of  tasks  during  the  formulation  of 
the  work  distribution  and  resource  allocation  decision. 

This  model  of  control  requires  some  activity  on  the  part  of  all  nodes  in 
order  to  satisfy  each  work  request.  There  is  no  single  node  that  is  by  design 
supposed  to  receive  any  more  activity  than  any  other  node;  instead,  the  work 
is  spread  across  all  nodes.  In  addition,  global  information  for  the  work 
distribution  and  resource  allocation  decision  is  obtained  for  each  work 
request  as  it  is  processed.  This  global  data  enables  the  control  to  better 
balance  the  load  aoross  the  network. 

This  control  model  is  not  without  its  problems.  The  global  searches  for 
resources  that  oocur  for  every  work  request  may  be  unnecessary  (e.g.,  in  those 
instances  in  which  only  looal  resources  are  required) .  Short  local  jobs 
therefore  suffer  at  the  expense  of  the  longer  jobs  utilizing  non-local  resour¬ 
ces. 
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6.1.2  XFDPS.2 

XFDPS.2  utilizes  a  single  centralized  file  system  directory.  On  the 
suriace,  this  model  appears  to  be  simple  to  implement.  A  central  directory  is 
maintained,  and  all  file  system  queries  are  sent  to  the  node  housing  that 
directory.  However,  problems  result  when  fault- tolerant  operation  is  desired. 
No  longer  can  a  single  central  directory  be  maintained  because  the  loss  of  the 
node  housing  the  directory  would  be  catastrophic.  Alternative  strategies 
which  provide  for  fault-tolerant  operation  (see  for  example  Garcia-Molina' s 
technique  described  in  [Garc79l  for  providing  fault  tolerance  in  a  centralized 
locking  distributed  data  base  system)  significantly  complicate  the  design  of 
the  control  as  well  as  require  a  significant  expenditure  of  resources  in  order 
to  recover  from  a  failure.  It  should  be  noted  that  the  simulation  of  XFDPS.2 
does  not  account  for  the  overhead  required  to  provide  fault-tolerant 
operation.  Therefore,  the  average  work  request  response  times  observed  in  the 
experiments  are  lower  than  would  be  expected  if  the  necessary  control  features 
for  providing  fault- tolerant  operation  were  present. 

Model  XFPDS.2  also  has  problems  with  growth.  When  a  new  node  is 
introduced  into  the  system,  a  large  amount  of  work  is  required  to  update  the 
central  directory  to  add  the  resources  of  the  new  node.  This  factor  can  be 
quantified  and  will  be  the  subject  of  future  experiments. 

6.1.3  XFDPS.3 

The  XFDPS.3  model  is  similar  to  XFDPS.1.  It  differs  in  its  policy  for 
obtaining  file  availability  information.  First  a  local  search  is  made.  If 
all  resources  are  found,  they  are  utilized;  otherwise,  a  global  search  for 
resource 3  is  conducted.  As  described  in  Section  5,  this  model  provides  faster 
response  to  work  requests  utilizing  only  local  resources  as  expected.  Due  to 
its  information  gathering  policy,  the  potential  for  utilizing  distant  resour¬ 
ces  in  order  to  balance  the  load  is  sacrificed  because  resource  availability 
on  other  nodes  may  never  be  considered. 

6.1.4  XFDPS.4 

XFDPS.4  utilizes  redundant  copies  of  the  file  system  directory  on  all 
nodes.  Access  to  the  directory  is  restricted  to  the  node  possessing  the 
control  vector  that  is  passed  among  the  nodes  of  the  network.  This  model 
tends  to  work  somewhat  like  a  batch  system  by  delaying  file  system  requests 
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until  the  control  vector  (CV)  ia  received  and  processing  these  requests  as  a 
batch. 

The  presence  of  the  replicated  file  directory  implies  that  there  is  both 
duplication  of  information  storage  and  duplication  of  effort  as  consistency  is 
maintained  across  the  replicated  copies.  Since  file  system  requests  are 
delayed  until  the  CV  arrives,  jobs  with  very  short  service  times  may 
experience  unusually  large  response  times.  Finally,  as  with  XFDPS.2,  the 
introduction  of  a  new  node  requires  a  large  amount  of  work  in  order  to  update 
the  replicated  directories. 

6.1.5  XFDPS.5 

XFDPS.5  is  nearly  identical  to  XFDPS.1,  differing  only  in  its  policy  of 
not  locking  or  in  any  way  reserving  resources  prior  to  the  formulation  of  a 
work  distribution  and  resource  allocation  decision.  With  this  policy,  resour¬ 
ces  are  not  expected  to  be  needlessly  tied  up  in  most  cases.  A  problem  does 
exist  if  Int.  chosen  resources  oannot  be  locked  once  selected  for  allocation. 
In  this  case,  a  new  resource  allocation  decision  must  be  made  and  already 
allocated  and  locked  resources  may  need  to  be  released. 

6.1.6  XFDPS.6 

XFDPS.6  differs  from  XFDPS.1  in  the  manner  in  which  the  task  graph  and 
task  activation  are  handled.  In  this  model,  the  tasks  of  a  work  request  that 
are  chosen  to  execute  on  the  same  node  are  presented  to  the  PROCESS  MANAGER  of 
the  selected  node  collectively.  A  task  graph  identifying  this  collection  of 
tasks  is  constructed  and  task  activation  and  termination  are  handled  by  the 
PROCESS  MANAGER.  Thus,  the  TASK  SET  MANAGER  need  send  only  one  message  to 
each  of  the  nodes  utilized  by  the  work  request  in  order  to  activate  all  tasks. 
In  addition,  only  one  termination  message  is  received  from  each  node.  Further 
savings  are  provided  because  the  PROCESS  MANAGER  on  the  node  where  the  tasks 
are  executing  can  immediately  release  the  resources  utilized  by  the  tasks  as 
each  task  terminates. 

6.2  CQMGLPaiQHS 

One  must  remember  when  analyzing  the  results  in  Appendix  2  that  only 
control  message  traffic  ij  present  during  these  simulation  experiments.  The 
simulation  experiments  may  be  inoonolusive  in  establishing  the  relative  merits 
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of  the  various  models.  They  do,  though,  demonstrate  the  utility  of  the  fully 
distributed  processing  oonoept.  Even  networks  with  communication  links  pos¬ 
sessing  low  bandwidths  appear  to  be  feasible  candidates  for  fully  distributed 
processing  if  the  message  traffic  is  held  mainly  to  control  messages.  In 
particular,  the  experiment  with  the  single  node  network  leads  one  to  expect 
that  there  will  be  little  or  no  performance  loss  experienced  with  an  FDPS. 

One  of  the  most  important  results  of  this  research  is  the  production  of 
a  simulator  for  the  analysis  of  fully  distributed  processing  systems.  The 
experience  gained  from  the  simulator  has  been  the  basis  for  the  proposal  of 
several  interesting  experiments  to  be  conducted  in  the  future. 
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SECTION  7 

FUTURE  EXPERIMENTS 

This  work  has  suggested  several  future  experiments.  First,  networks  of 
increasing  numbers  of  nodes,  possibly  10,  15,  and  20  node  networks,  will  be 
investigated  to  determine  at  what  point  the  utility  of  the  various  models  is 
lost.  In  addition,  experiments  with  both  user  message  traffic  and  control 
message  traffic  will  be  investigated  in  order  to  observe  the  sensitivity  of 
the  various  models  in  the  presence  of  a  busy  communication  system.  Different 
resource  allocation  and  work  distribution  algoi'thms  will  be  instrumented  into 
the  simulator  in  order  to  determine  under  what  conditions  each  algorithm  is 
appropriate. 

The  issue  of  the  dynamic  addition  and  deletion  of  resources  will  also  be 
examined.  This  will  demonstrate  how  gracefully  the  various  models  can  adapt 
to  a  growing  system.  These  experiments  will  also  examine  the  fault-tolerant 
capabilities  of  the  various  models. 
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APPENDIX  1 

CONTROL  MODEL  PSEUDO  CODE 


1.1  PSEUDO  CODE  FOR  THE  XFDPS.1  CONTROL  MODEL 


1.1.1 


1:  process  system  Initiator: 

2:  {  Every  node  possesses  one  of  these  processes.  This  process 

3:  initiates  a  node  in  the  network  by  assigning  * task^set—manager* 

4:  processes  to  each  connected  user  terminal,  activating  the 

5:  ,file_systenLJnanager,  process,  and  activating  the 

6:  *processor_utilizatiob_Pianager •  process.  } 

7: 

8:  begin 

9:  for  every  attached  user  terminal  i  .da 

10:  tasK_set_jnanager  (TERMINAL,  i); 

11 :  endfor: 

1 2 :  f  ile_systenL_manager ; 

1 3 :  processor_utilizatiorL.manager ; 

14:  end  system_initiator ; 

1.1.2  Task  Set  Manager 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
27 


process  task_set_manager  (case  input_origin:  inp_orig  of 

TERMINAL:  (tent:  terminal^address) ; 
CMNDFILE:  (fd:  filedescriptor) 

£04); 

{  Every  terminal  and  every  executing  command  file  are  assigned 
a  ,task_set_jnanager'  process.  When  a  process  of  this  type 
is  activated,  one  of  two  sets  of  parameters  is  passed  to  it 
depending  upon  the  source  of  input  to  the  process.  If  the 
process  is  assigned  to  handle  input  from  a  terminal,  the 
address  of  the  terminal  is  provided.  If  the  process  is 
assigned  to  handle  input  from  a  command  file,  the  file 
descriptor  for  the  command  file  is  provided.  } 

var 

tg:  task  graph  pointer; 
eommand_J.ine :  string; 
msg:  message_pointer ; 

begin 

while  <either  the  terminal  is  attached  or  the  end 
of  the  file  has  not  been  reached>  do 

<get  the  next  work  request  and  store  it  in  comma  nd__line>; 
new  (tg); 

parse  ( commandLline ,  tg) ; 

<send  a  message  of  type  Ml  (file  availability  request)  to 
the  file_systenLJDanager  on  this  node  that  contains  the 
names  of  files  need  for  this  work  request); 
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28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 


<send  a  message  of  type  M2  (processor  utilization  request) 

to  the  processor_utilizatioq_P»anager  on  this  node>; 

<wait  for  a  message  from  processor_utilizatioi\_jnanager>; 
<store  processor  utilization  information  in  tg*>; 

<wait  for  a  message  from  file_system_jnanager>; 

<store  file  availability  information  in  tg*>; 
if  worK_distributor_an<Cresouroe_allocator  (tg)  =  ERR  then 
{  work  distribution  and  resource  allocation 
decision  could  not  be  made  } 

Creport  error>; 

if  input_origin  =  CMNDFXLE  then 
exit  {  leave  the  loop  } 
else 

next  {  next  iteration  of  loop  } 

£KLL£; 

.endif ; 

<send  a  message  of  type  M3  (file  lock  and  release  request) 
to  the  file_systeni_manager  on  this  node>; 

<wait  for  a  message  from  file_systen^_manager>; 
if  <all  locks  could  not  be  applied>  then 
<report  error >; 

<send  a  message  of  type  M4  (file  release  request) 
to  the  f ile_system_manager  on  this  node>; 
i£  input_origin  =  CMNNDFILE  then 
exit  {  leave  the  loop  } 

next  {  next  iteration  of  loop  } 

for  <all  files  chosen  to  be  copied  before  execution>  da 
<send  a  message  of  type  M5  (file  copy  request)  to  the 
file_system_manager  on  this  node>; 
if  <files  need  copying>  then 

<wait  for  a  message  from  the  file_systemjnanager>; 

Mil; 

for  <each  node  i  chosen  to  execute  parts  of  the 
work  request>  da 

<send  a  message  of  type  M6  (process  activation  request) 
to  the  process_jnanager  on  node  i>; 
endfor: 
csaaat 

<wait  for  a  termination  message  from  a  prooess_jnanager 
or  a  request  to  terminate  the  command  file  from 
the  process_manager  that  activated  this 
task_set_jnanager> ; 

if  <this  is  a  termination  message  from  a 
process_jnanager>  then 

<mark  the  terminated  task  as  completed  in  tg*>; 

<send  a  message  of  type  M4  (file  release  request) 
to  the  file_system_jnanager  on  this  node>; 
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98 


1L  <the  termination  status  indicated  that  the 
process  terminated  due  to  an  error>  then 
for  <each  node  i  still  running  parts  of  thi3 
work  request >  da 

<send  a  message  of  type  M7  (process  kill  request) 
to  the  prooess_jnanager  on  node  i>; 
endfor; 
endif : 
alaa 

for  <every  task  of  the  work  request >  da 
if  <the  task  has  not  completed>  then 

<send  a  message  of  type  M7  (process  kill  request) 
to  the  process_jnanager  responsible  for 
the  task>; 

endif ; 

endfor: 

break:  {  exit  the  loop  } 
endif: 

until  <all  tasks  have  terminated>; 
endwhlle : 

end  task_set._manager ; 


1.1.3  File  System  Manager 
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2 
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process  file_system_manager; 

{  Every  node  possesses  one  of  these  processes.  This  process 
satisfies  various  requests  concerning  the  file  system. 

This  is  accomplished  by  communicating  with  the  file_set_managers 
on  all  nodes.  } 

var 

msg:  message_pointer ; 

favptr :  file_availability_rec_jx>inter ; 

flrprt :  f ile_lock_and_release_reqjx>inter ; 

begin 

loop 

<wait  for  a  message  of  any  type  (let  msg  point  to 
the  message) >; 
case  msg'‘.message_type  a£ 

Ml :  {  file  availability  information  request  } 
begin 

new  (favptr); 

<insert  the  record  favptr  points  to  into  the 
list  of  fav_reos>; 

<record  the  names  of  the  files  identified  in  msg“>; 
for  <each  node  i>  do 

<send  a  message  of  type  M8  (file  availability 
request)  to  the  f ile_set_jnanager  on  node  i 
that  contains  the  names  of  all  files>; 
eqdfor; 

fiOd; 
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62 
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M3:  {  file  look  and  release  request  } 
begin 

new  (flrptr); 

<insert  the  record  flrptr  points  to  into  the 
list  of  flr_recs>; 

for  <each  node  i>  dp 

<send  a  message  of  type  M9  (file  lock  and 
release  request)  to  the  file_set_jnat.ager 
on  node  i  that  contains  the  names  of  all 
files  from  msgA  that  are  identified 
as  being  located  at  node  i>; 

endf or : 

M4:  {  file  release  request  > 
begin 

for  <each  node  i>  do 

<send  a  message  of  type  M10  (file  release 
request)  to  the  file_set_manager  on 
node  i  that  contains  the  names  of  all 
files  from  msg*  that  are  identified  as 
being  located  at  node  i>; 

endf or : 

£Ol; 

M5:  {  file  copy  request  } 
begin 

new  (fmvptr); 

<insert  the  record  fmvptr  points  to  into  the  list 
of  fmv_rees>; 

for  <each  file  named  in  msg“>  dp. 

Cinsert  the  file  name  into  fmvptr^; 

<send  a  message  of  type  Mil  (create  file  request) 
to  the  file_set_manager  on  the  node  where 
the  file  is  to  be  copied>; 

gjilfar,; 

£M; 

M12:  {  file  availability  info  from  file_set_manager  } 
begin 

<let  favptr  point  to  the  fav_rec  that  msg* 
is  a  response  to>; 

<fill  in  the  availability  information  in  favptr's>; 

if  <responses  from  all  file_set_managers 
have  been  received>  then 

<send  a  message  of  type  M16  (file  availability 
information)  to  the  task_set_manager 
identified  by  a  field  of  favptr'“>; 

POlil; 

Ml 3 :  {  file  look  and  release  results  from  file_set_manager  } 
begin 

<let  flrptr  point  to  the  flr_rec  that  msg* 
is  a  response  to>; 

<fill  in  the  lock  and  release  results  in  flrptr*>; 
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if  <responses  from  all  file_set_jnanagers 

that  were  contacted  have  been  received>  then 
<send  a  message  of  type  M17  (results  of  file 

lock  and  release  request)  to  the  tasli^jjet_manager 
identified  by  a  field  of  flrptr*>; 

endif ; 

Ml 4 :  {  result  of  file  creation  request  from  file_set_manager  } 
begin 

(  This  message  is  part  of  a  series  of  messages 
used  to  copy  a  file  from  one  node  to  another. 

At  this  point,  file  processes  have  been  activated 
at  both  the  sending  and  receiving  nodes.  The 
next  step  is  to  send  a  signal  to  the  sending 
process  to  begin  transmission.  } 

<send  a  message  of  type  Ml 8  (signal  to  begin  copy) 
to  the  sending  file  process  in  the  copy 
operation>; 

£M; 

M15  s  {  oopy  completion  signal  from  a  file  process  } 
begin 

<let  fmvptr  point  to  the  fmv_rec  that  msg* 
is  a  response  to>; 

<record  in  fmvptr*  that  the  oopy  operation 
indicated  in  msg*  has  been  oompleted>; 
if  <all  copy  operations  have  been  completed>  then 
<send  a  message  of  type  Ml 9  (results  of  file 
oopy  request)  to  the  task_set_manager 
identified  by  a  field  of  fmvptr*>; 

sn&iL; 

flndsaas; 

endloop : 

end  file__system_manager; 


1.1.4  ErftoflaaQr  gtmzaUoa  Manager 

1:  process  processor  utilization  manager; 

2:  {  Every  node  possesses  one  of  these  processes.  This  process 

3:  records  the  latest  prooassor  utilization  information  received 

4:  from  each  node’s  processor_utilizatioi\jnonitor;  it  provides 

5:  t a 3K_se t_mana ge r s  with  this  information  on  demand;  and 

6:  if  it  does  not  hear  from  a  processor_utilization_P>onitor 

7:  within  a  particular  interval  of  time,  it  records  the  processor 

8:  as  down  and  attempts  to  oontaot  that  processor  ulilizatioqjmonitor.  } 

9: 

10;  var 

11:  msg:  message_pointer ; 

12;  pcutil:  array  [NODE§_0F_THELNET]  SiL  pq_utilization; 

13: 
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29 
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begin 

<wait  for  a  message  of  any  type  (let  msg  point  to 
the  message) >; 
oase  msg"\message__type  a£ 

M2:  {pc  utilization  information  request  } 
begin 

<send  a  message  of  type  M20  (pc  utilization 
information)  to  the  tasK_3et_manager  that 
sent  the  message  and  is  identified  in  msgA>; 

mL; 

M3:  {pc  utilization  information  from  monitor  } 
begin 

<record  information  in  msg“  in  pcutil  [msgA.node]>; 
<reset  deadman  timer  for  information  arriving 
from  node  msgA.node>; 

.eni; 

M22:  {  deadman  timer  signal  -  this  indicates  that  a 

processor_utilizatior\jnonitor  has  not  reported 
within  the  required  time  } 
begin 

pcutil  [msg\node]  :=  N0T_AVAILABLE; 

<send  a  message  of  type  M23  ("are  you  alive?" 
query)  to  the  prooessor__utilizatior\_monitor 
on  node  msgA.node>; 

endoase; 

anfllftop; 

end  processor_utilization_managerj 


1.1.5  froQgaaflr  U.Ulizatloa  Monitor 


1:  process  proces3or_utilization_monitor; 

2:  {  Every  node  possesses  one  of  these  processes.  This  process 

3:  records  various  performance  measurements  and  computes  a 

4:  processor  utilization  value  that  is  periodically  transmitted 

5:  to  all  processor_utilization_managers.  } 

6: 

7 :  begin 

8:  loop 

9:  <gather  performance  measurements^ 

10:  <compute  processor  utilization  value>5 

1 1 :  for  <each  node  i>  da 

12:  <send  a  message  of  type  M21  (processor  utilization 

13:  information)  to  the  processor_utilization_jnanager 

14:  on  node  i>; 

15:  .Sfi&jQC.; 

16:  <sleep  until  it  is  time  to  gather  more  measurements^ 

17:  <wait  until  it  is  time  to  gather  more  measurements 

18:  or  a  message  from  a  processor_utilization_manager 

19:  arrives>; 

20:  flnllQftB. 

21:  end  processor_utilization_monitor ; 
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1.1.6  Process  Manager 
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process  process.jmanager ; 

{  Every  node  possesses  one  of  these  processes.  This  prooess 
manages  the  processes  that  are  executing  on  its  node.  } 

pcbptr :  process_control_blocK_pointer ; 
process_name_table :  process_name_toj)cbptr_jnap ; 
msg:  message_pointer; 

begin 

lass 

<wait  for  the  arrival  of  a  message  (let  msg  point 
to  the  message) >; 
case  msg'*.message_type  sj£ 

M6:  {  process  activation  request  } 
begin 

if  <process  type  is  an  object  file>  then 
new  (pcbptr); 

<record  process  identifying  information 
and  pcbptr  in  process__name_table>; 

<fill  in  the  necessary  information  in  pcbptr*>; 
Cinitiate  the  loading  of  the  process>; 

tasK_.setjmanager  (CMNDFILE,  msg''.file_descriptor) ; 
<record  process  identifying  information 
and  taskL.se t__manager  identification  in 
process_name_table> ; 

mii£; 

ml; 

M7:  {  process  kill  request  } 
begin 

<find  the  process  in  process__name_table>; 
jf  <the  process  is  an  object  file>  then 
<terminate  the  process>; 

<unload  the  process>; 

<dispose  of  the  process  control  block>; 

<send  a  message  of  type  M24  (process 

termination  message)  to  the  tasK_set_manager 
that  activated  the  process>; 
else  {  the  process  is  a  command  file  } 

<send  a  message  of  type  M25  (request  to  terminate 
the  execution  of  a  command  file)  to  the 
task_.set_jnanager  executing  this  command  file>; 

snfllf ; 
ml; 

endcase: 

sndlaaB ; 

end  process_manager ; 
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1.1.7  File  Set  Manager 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
21 
28 

29 

30 

31 

32 

33 

34 

36 

37 

38 

39 

40 

41 

42 
M3 

44 

45 

46 

47 
4b 
49 


process  filG_S9t_jnanaGer ; 

{  Every  node  possesses  one  of  these  processes.  This  process 
manages  the  files  located  on  its  node.  } 

xan 

mag:  message_pointer ; 

f  ile_directory :  f ilej.ocation_J.nf omation ; 

Jt2££iU 

loop 

Cwait  for  the  arrival  of  a  message  (let  msg  point 
to  the  message) >; 
case  nisg'.message_type  qL 

M6;  l  file  availability  request  } 

UftKlB 

for  <eaeh  file  named  in  msgA>  do 
<search  for  the  file>; 
if  <the  file  was  found>  then 
if  <the  file  is  free>  then 
Creserve  the  file>; 

Crecord  the  desired  access  to  the  file>; 
<note  that  the  file  is  available>; 
slss. 

if  <the  desired  access  to  the  file 
is  READ>  and  <the  access  already 
granted  to  the  file  is  READ>  then 
<note  that  the  file  is  available>; 
else 

Cnote  that  the  file  is  not  available>; 

endif : 

slae. 

<note  that  the  file  is  not  available>; 

audit; 

gndf.gr; 

<s°nd  a  message  of  type  Ml 2  (file  availability 
information)  to  the  file__systenuDanager 
on  node  msg* „ node >; 

god.; 

M9:  {  file  Iock  and  release  request  } 
begin 

for  <each  file  in  msgA>  da 
<search  for  the  fila>; 

XL  <the  file  was  found>  then 

<lock  or  release  the  file  as  requested>; 
eloO 

Cnote  that  the  request  could  not  be  3atisfied>; 
audit; 
andfar; 
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<send  a  message  of  type  M13  (results  of  file  lock 
and  release  request)  to  the  file_system_manager 
on  node  msgA.node>; 

£i&; 

M10:  {  file  release  request  } 
begin 

for  <each  file  in  msgA>  do 

<search  for  the  file  and  release  the  lock  on  it>; 

£&d£ozi; 

Mil:  {  file  creation  request  } 
begin 

<create  an  entry  for  a  new  file  in  file_directory>; 
<activate  a  file  process  for  the  file>; 

<send  a  message  of  type  M14  (results  of  file 
creation)  to  the  file_sy3tem_manager  on 
node  mag A. node >; 

•fiM* 

endcase: 

gndlQpp; 

end  file_set_manager; 


1 .2  PSEODO  CODE  FOR  THE  IFDPS.2  CONTROL  MODEL 

1.2.1  ^yjatfla  Initiator 


Same  as  XFDPS.1, 


1  2.2  Task  Set  Manager 


XFDPS.1  with  the  following  changes: 


25:  <send  a  message  of  type  M2  (file  availability  request)  to 

26:  the  file_systenLjnanager  on  node  1  that  contains  the 

27:  names  of  files  needed  for  this  work  request>; 

44:  <send  a  message  of  type  M3  (file  lock  and  release  request) 

45:  to  the  rile_system_jnanager  on  node  1>; 

76:  <send  a  message  of  type  M4  (file  release  request) 

77:  to  the  file_system_nianager  on  node  1>; 

1.2.3  File  System  Manager 
process  file_system_jmanager ; 

{  This  process  resides  on  node  1  and  satisfies  various  requests 
concerning  the  file  system.  This  process  maintains  the 
centralized  file  system  directory.  } 

var 

msg:  message_pointer ; 
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begin 

ISStSL 

<wait  for  a  message  of  any  type  (let  mag  point  to 
the  message) >; 
case  msgA.message_type  .&£ 

Ml:  {  file  availability  information  request  } 
begin 

for  <each  file  named  in  msgA>  do 
<search  for  the  file>; 
if  <the  file  was  found >  then 
for  <each  node  i>  lia 

if  <the  file  is  free  on  node  i>  then 
<reserve  the  file>; 

<record  the  desired  access  to  the  file>; 
<note  that  the  file  is  available  on 
node  i>; 

else 

if  <the  desired  access  to  the  file 
is  READ>  and  <the  access  already 
granted  to  the  file  is  READ>  then 
<note  that  the  file  is  available  on 
node  i>; 

else 

<note  that  the  file  is  not  available 
on  node  i>; 

sn&UL; 

.sMil; 
endfor: 
ela e 

<note  that  the  file  is  not  available  on 
any  node>; 

endif : 

jansifaz; 

<send  a  message  of  type  Ml 2  (file  availability 
information)  to  the  tasku_set_jnanager  requesting 
the  information^ 

.§M; 

M3:  {  file  lock  and  release  request  } 
begin 

for  <each  file  in  msgA>  do 
< search  for  the  file>; 
if  <the  file  was  fourtd  and  is  present 
on  the  node  specified>  then 
<lock  or  release  the  file  as  requested>; 
else 

<note  that  the  request  could  not  be  satisfied>; 

andlfl; 

■sndfor; 

<send  a  message  of  type  Ml 3  (results  of  file  lock 
and  release  request)  to  the  tasK_set_jnanager 
that  made  the  request>; 

ani; 
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M4:  {  file  release  request  } 
begin 

for  <each  file  in  msgA>  jia 

< sear oh  for  the  file  and  release  the  lock  on  it>; 
endfor; 

er.dcaae ; 

endloop ; 

end  file_systenLJsanager ; 

1.2.4  Process  Otlllgation 

Same  as  XFDPS. 1 . 

1.2.5  Proaftaaor  PtUiMtloa  Monitor 

Same  as  XFDPS. 1 . 

1.2.6  Process  Manager 
Same  as  XFDPS. 1. 

1 .3  PSEUDO  CODE  FOR  THE  XFDPS. 3  CONTROL  MODEL 

1.3.1  Initiator 

Same  as  XFDPS. 1 . 

1 .3.2  Task  Set  Manager 
Same  as  XFDPS. 1 . 

1.3*3  File  System  Manager 

XFDPS. 1  with  the  following  changes: 

23:  <send  a  message  of  type  M8  (file  availability 

24:  request)  to  the  file_set_manager  on  the  same  node 

25:  as  this  file_systenjnanager>; 

26: 

27: 

69:  A£  <this  response  is  from  this  node>  and 

70:  <all  files  have  not  been  found  available>  then 

71:  for  <every  other  node  i>  do 

72:  <send  a  message  of  type  M8  (file  availability 

73:  request)  to  the  file_set_manager  on  node  i>; 

74:  endfor : 

74a:  else 

74b:  i£  <responses  from  all  file_sat_managers  have  been 

74c:  received  or  all  files  have  been  found  locally>  then 
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74d:  <send  a  message  of  type  Ml 6  (file  availability 

74e:  information)  to  the  taskL_set_jnanager  identified 

74f:  by  a  field  of  favptrA>; 

74g:  endif : 

74h:  endif: 

1.3.4  froQMfl  PtlUiatlca 

Same  as  XFDPS.1. 

1.3.5  Prooeasor  Utilization  Monitor 
Same  as  XFDPS.1. 

1.3.6  Manager 
Same  as  XFDPS.1. 

1.6.7  File  Set  Manager 
Same  as  XFDPS.1. 

1 .4  PSEODO  CODE  FOR  THE  XFDPS.4  CONTROL  MODEL 

1.4.1  ^xatan  Initiator 

Same  as  XFDPS.1. 

1 .4.2  Tank  Set  Manager 
Same  as  XFDPS.1. 

i  .4.3  fils.  .Szatan  Manager 

process  f ile_sy3tem_manager ; 

{  Every  node  possesses  one  of  these  processes.  This  process 
satisfies  various  requests  concerning  the  file  system  and 
helps  maintain  the  redundant  oopies  of  the  file  system 
directory.  } 

v^r 

msg:  message_jpointer ; 

begin 

loan 

<walt  for  a  message  of  any  type  (let  msg  point  to 
the  message) >; 

■C.a&g.  msgA.message_type  SlZ 

Ml,  M3,  M4:  {  availability,  lock,  and  release  requests  } 
begin 

<place  the  message  on  the  queue  of  file  system 
requests  arriving  at  this  node>; 

and.; 
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CV:  {  control  vector  } 
begin 

while  <the  file  system  request  queue  is 
not  empty>  da 

Cremove  a  message  from  the  queue  (let  msg  point 
to  the  message) >; 
case  msgA.message_type  af 

Ml :  {  file  availability  information  request  } 
begin 

for  <each  file  named  in  msgA>  da. 

<search  for  the  file>; 
if  <the  file  was  found>  then 
for  <each  node  i>  da 

II  <the  file  is  free  on  node  i>  then 
<reserve  the  file>; 

<record  the  desired  access  to  the  file>; 
<note  that  the  file  is  available  on 
node  i>; 

else 

if  <the  desired  access  to  the  file 
is  READ>  and  <the  access  already 
granted  to  the  file  is  READ>  then 
<note  that  the  file  is  available  on 
node  i>} 

else 

<note  that  the  file  is  not  available 
on  node  i>; 

endlf ; 
sMil; 
andlac; 

else 

<note  that  the  file  is  not  available  on 
any  node>; 

andil; 

jsndfgr,; 

<send  a  message  of  type  Ml 2  (file  availability 
information)  to  the  tasK_set_manager  requesting 
the  information^ 

M3:  (  file  lock  and  release  request  } 
begin 

for  <each  file  in  msgA>  da 
<search  for  the  file>; 
if  <the  file  was  found  and  is  present 
on  the  node  specified>  then 
<lock  or  release  the  file  as  requested>; 
else 

<note  that  the  request  could  not  be  satisfied> 

mill; 

endfor: 
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<send  a  message  of  type  Ml 3  (results  of  file  lock 
and  release  request)  to  the  tasK_set_jnanager 
that  made  the  request >; 

£Ql; 

M4:  {  file  release  request  } 
begin 

for  <each  file  in  msgA>  do 

<search  for  the  file  and  release  the  lock  on  it>; 

snflftai; 

£KL; 
endcase : 
endwhile: 

<send  a  message  of  type  UPV  (update  vector)  to  the 
next  node  (according  to  the  predetermined 
ordering  of  nodes)  containing  the  changes  just 
made  to  the  file  system  directory>; 

UPV:  {  update  vector  } 
begin 

if  Cthis  UPV  was  origin  .ced  by  this  node>  then 
<send  a  message  of  type  CV  (control  vector)  to 
the  next  node  (according  to  the  predetermined 
ordering  of  nodes) >; 

else 

<update  the  file  system  directory>; 

<send  the  message  of  type  UPV  (update  vector) 

to  the  next  node  (according  to  the  predetermined 
ordering  of  nodes) >; 

qnfllf ; 

£04; 

endcase: 

&nd).9<?p; 

end  f il  e_sy  steunjnanager ; 

1.4.4  Process  Utilization  Manager 

Same  as  XFDPS. 1 . 

1.4.5  JZEflssaagE  Utilisation  Maateflc 

Same  as  XFDPS.  1 . 

1.4.6  Process  Manager 
Same  as  XFDPS.  1. 

1 .5  PSEUDO  CODE  FOR  THE  XFDPS.1?  CONTROL  MODEL 
1.5.1  System  Initiator 
Same  as  XFDPS,  1 . 
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1 .5.2  Task  Sat  Manager 
Same  as  XFDPS.1. 

1 *5.3  File  System  Manager 

Same  as  XFDPS.1. 

1.5.4  Process  Utilization  Manager 
Same  as  XFDPS.1. 

1 .5.5  PraQMBOr  Utilization  Monitor 

Same  as  XFDPS.1. 

1.5.6  Process  Manager 
Same  as  XFDPS.1. 

1.5.7  File  Jet  Manager 

XFDPS.1  with  the  following  changes: 

20:  <note  that  the  file  is  available>; 

21: 

22: 


1 .6  PSEUDO  XFDPS.6  CONTROL  MODEL 

1.6.1  System  Initiator 

Same  as  XFDPS.1. 

1.6.2  Task  Set  Manager 

XFDPS.1  with  the  following  changes: 

75:  for  <eaoh  task  in  the  message>  && 

76:  <mark  the  task  as  completed  in  tg''>; 

77:  .snoEar; 

87 :  for  <every  node  i  still  executing  parts  of  the  work 

88:  request>  .Jip. 

89:  <send  a  message  of  type  M7  (process  kill  request) 

90:  to  the  process_manager  on  node  i>; 

91 :  endfor: 

92: 

9D: 
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1.6.3  Fi^e  System  Manager 
Sane  as  XFDPS. 1 . 

1 .6.4  JBcaaaaa  utlliartloa  Haimer 

Same  as  XFDPS. 1. 

1.6.5  Prooeaaor  Utilization  Monitor 
Same  as  XFDPS. 1. 

1.6.6  Prooesa  Manager 

process  process_jnanager; 

{  Every  node  possesses  one  of  these  prooesses.  This  process 
manages  the  processes  that  are  executing  on  its  node.  } 

jtac 

pcbptr :  process_oontrolt_block_pointer ; 
process_name_table :  process_name_to__pcbptr_map ; 
subtg:  task  graph  pointer: 
msg:  message_pointer; 

■begin 

loop 

<uait  for  the  arrival  of  a  message  (let  msg  point 
to  the  message) >; 
case  msg".message_type  sL 

M6:  {  process  activation  request  } 

begin 

new  (subtg) ; 

for  <each  task  i  im  msg*>  iia 
<record  task  i  in  subtg* >; 
if  <task  i  names  an  object  file>  then 
new  (pcbptr); 

<record  process  identifying  information 
and  pcbptr  in  process_name_table>; 

<fill  in  the  necessary  information  in  pcbptr*>; 
<initiate  the  loading  of  the  process>; 
else 

task_set_manager  (CMNDFILE,  msg*.file_descriptor ) ; 
<record  process  identifying  information 
and  tasK_set_manager  identification  in 
process_name_table> ; 

■endfor; 

<link  subtg*  onto  the  list  of  subtaskgraphs  executing 
on  this  node>; 

sad.; 
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M7:  {  prooess  kill  request  } 
begin 

<find  the  subtaskgraph  in  the  list  of 

subfcaskgraphs  exeouting  on  this  node  (let 
subtg  point  to  the  subtaskgraph) >; 

for  <eaoh  task  i  in  subtg*> 

if  <task  i  has  not  completed>  then 

if  <task  i  names  an  object  file>  then 
<terminate  the  prooess>; 

<unload  the  process>; 

<dispose  of  the  prooess  control  block>; 

<mark  task  1  as  terminated>; 
else  {  the  process  is  a  command  file  } 

<send  a  message  of  type  M25  (request  to  terminate 
the  execution  of  a  command  file)  to  the 
task_set_manager  executing  this  command  file>; 

mill; 

■sndif ; 

endfor : 

if  <all  the  tasks  in  subtg*  have  completed>  then 
<send  a  message  of  type  M2 H  (subtaskgraph 

termination  message)  to  the  task_set_manager 
that  activated  the  subtaskgraph>; 

<remove  subtg*  from  the  list  of  subgraphs 
executing  on  this  node>; 
dispose  (subtg); 

£001X1; 

ejMteflas; 

sndlppp; 

end  process_jnanager; 
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APPENDIX  2 


SIMULATION  RESULTS 


2.1  RESULTS  OF  GROUP  1  EXPERIMENTS 
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2.3  RESULTS  SI  A,  SIMPLE  MODE  SIMULATION 

Average  Work  Request  Response  Tine  for 
a  Single  Node  Network 
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Mean:  44.1  seconds 
Standard  Deviation:  0.38 
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